Mono csharp AOT Optimizations

Question

Mono csharp AOT Optimizations

I am starting to learn C#, coming from a C++ background. I wanted to learn about the C# memory model and compare it to that of C++. In doing so I found the article The C# Memory Model in Theory and in Practice. So far, nothing is that surprising, but I tried to reproduce the compiler optimization in the article that removes an extra memory read and ran into an issue. I'm using the mono csharp compiler, and can't reproduce the optimization. Here is the C# code:

// test.cs

class MainApp {
  static void Main() {
    Foo foo = new Foo();
    foo.bar();
  }
}

class Foo {
  private int _A = 0, _B = 1;
  public bool bar() {
    if (_B == -1) throw new Exception();
    int a = _A;
    int b = _B;
    return a > b;
  }
}

I then run the following compilation commands:

mcs -optimize+ test.cs
mono --aot -O=all test.exe

When I check the output of objdump -d test.exe.so, I see the following (relevant) lines of assembly:

0000000000000500 <Foo_bar>:
 500:   48 83 ec 08             sub    $0x8,%rsp
 504:   48 89 3c 24             mov    %rdi,(%rsp)
 508:   48 8b c7                mov    %rdi,%rax
 50b:   48 63 40 14             movslq 0x14(%rax),%rax
 50f:   83 f8 ff                cmp    $0xffffffff,%eax
 512:   74 1b                   je     52f <Foo_bar+0x2f>
 514:   48 8b 0c 24             mov    (%rsp),%rcx
 518:   48 63 41 10             movslq 0x10(%rcx),%rax
 51c:   48 63 49 14             movslq 0x14(%rcx),%rcx
 520:   3b c1                   cmp    %ecx,%eax
 522:   40 0f 9f c0             setg   %al
 526:   48 0f b6 c0             movzbq %al,%rax
 52a:   48 83 c4 08             add    $0x8,%rsp
 52e:   c3                      retq  

 ... # exception stuff

So, instructions 50b, 518, and 51c seem to indicate that the read is still reoccurring, even though it could be optimized out. My question is, am I doing something wrong, is this a missed optimization opportunity, or is there some other issue at hand here (some good reason why this optimization wouldn't take place)? I don't right now have access to Visual Studio, I would be interested to hear whether or not it actually makes this optimization.

The article claims I should get something like:

push        eax
mov         edx,dword ptr [ecx+8]
cmp         edx,0FFFFFFFFh
je          00000016
mov         eax,dword ptr [ecx+4]
cmp         eax,edx

I decided to check if the situation was different for C++, and was a little bit surprised with what I found. The following code:

class Foo {
  int _A{0};
  int _B{1};

  public:
  __attribute__ ((noinline)) bool bar() volatile {
    if (_B == -1) throw 0;
    int a = _A;
    int b = _B;
    return a > b;
  }
};

int main(int argc, char **argv) {
  volatile Foo foo;
  foo.bar();
}

Leads to the following (relevant) assembly:

00000000000007e4 <_ZNV3Foo3barEv>:
 7e4:   8b 47 04                mov    0x4(%rdi),%eax
 7e7:   83 f8 ff                cmp    $0xffffffff,%eax
 7ea:   74 0b                   je     7f7 <_ZNV3Foo3barEv+0x13>
 7ec:   8b 17                   mov    (%rdi),%edx
 7ee:   8b 47 04                mov    0x4(%rdi),%eax
 7f1:   39 c2                   cmp    %eax,%edx
 7f3:   0f 9f c0                setg   %al
 7f6:   c3                      retq   

 ... # exception stuff

So the optimization doesn't take place here either (although, to be fair, without the ((noinline)) this code doesn't even show up in the object file.) Here's the same from clang:

0000000000400610 <_ZNV3Foo3barEv>:
  400610:   50                      push   %rax
  400611:   8b 47 04                mov    0x4(%rdi),%eax
  400614:   83 f8 ff                cmp    $0xffffffff,%eax
  400617:   74 0a                   je     400623 <_ZNV3Foo3barEv+0x13>
  400619:   8b 07                   mov    (%rdi),%eax
  40061b:   3b 47 04                cmp    0x4(%rdi),%eax
  40061e:   0f 9f c0                setg   %al
  400621:   59                      pop    %rcx
  400622:   c3                      retq   

 ... # exception stuff

So there's still an extra read, just "inlined" in the cmp instruction.

Relevant --version

Mono C# compiler version 4.6.2.0
g++ (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)

c#

c++

mono

compiler-optimization

asked on Stack Overflow Feb 27, 2020 by

Nathan Chappell

1 Answer

I threw your code at SharpLab, and got:

    L0000: push rsi
    L0001: sub rsp, 0x20
    L0005: mov eax, [rcx+0xc]
    L0008: cmp eax, 0xffffffff
    L000b: jz L001e
    L000d: mov ecx, [rcx+0x8]
    L0010: cmp ecx, eax
    L0012: setg al
    L0015: movzx eax, al
    L0018: add rsp, 0x20
    L001c: pop rsi
    L001d: ret
    L001e: mov rcx, 0x7ffa2f8d4170
    L0028: call 0x7ffa8f384690
    L002d: mov rsi, rax
    L0030: mov rcx, rsi
    L0033: call System.Exception..ctor()
    L0038: mov rcx, rsi
    L003b: call 0x7ffa8f33a4f0
    L0040: int3

My assembly's more rusty than yours, but I can only see each field being accessed once?

Note that the .NET Core JIT is 2-tier and SharpLab only shows the first tier, so this might be further optimized if it turns out it's on a hot path.

Therefore this looks like a Mono thing?

answered on Stack Overflow Feb 27, 2020 by

canton7

User contributions licensed under CC BY-SA 3.0