jvm jit common subexpression elimination

2

Testing the following snippet using jmh.

class A {
    public int test(int i) {
        long sum = 0L;
        for (int j=0; j<i; ++j) {
            sum += 1 << (j % 32);
        }
        return (int)sum;
    }
}

@State(Scope.Thread)
public class MyBenchmark {
    public A a;
    int x;

    @Setup(Level.Trial)
    public void init() {
        a = new A();
        x = 1;
    }

    @Benchmark
    public int testMethod() {
        int res = 0;
        res += a.test(x);
        res += a.test(x);
        return res;
    }
}

Build and run the example using mvn package && java -XX:-UseCompressedOops -XX:CompileCommand='print, *.testMethod' -jar target/benchmarks.jar -wi 10 -i 1 -f 1 to get the following assembly (only showing the assembly for the final compilation).

ImmutableOopMap{}pc offsets: 762 772 800 Compiled method (c2)     397  551       4       org.sample.MyBenchmark::testMethod (32 bytes)
 total in heap  [0x00007f96dd74bd90,0x00007f96dd74c170] = 992
 relocation     [0x00007f96dd74bed0,0x00007f96dd74bee0] = 16
 main code      [0x00007f96dd74bee0,0x00007f96dd74bfa0] = 192
 stub code      [0x00007f96dd74bfa0,0x00007f96dd74bfb8] = 24
 oops           [0x00007f96dd74bfb8,0x00007f96dd74bfc0] = 8
 metadata       [0x00007f96dd74bfc0,0x00007f96dd74bfd0] = 16
 scopes data    [0x00007f96dd74bfd0,0x00007f96dd74c038] = 104
 scopes pcs     [0x00007f96dd74c038,0x00007f96dd74c168] = 304
 dependencies   [0x00007f96dd74c168,0x00007f96dd74c170] = 8
----------------------------------------------------------------------
org/sample/MyBenchmark.testMethod()I  [0x00007f96dd74bee0, 0x00007f96dd74bfb8]  216 bytes
Argument 0 is unknown.RIP: 0x7f96dd74bee0 Code size: 0x000000d8
[Entry Point]
[Constants]
  # {method} {0x00007f95ed0efe80} 'testMethod' '()I' in 'org/sample/MyBenchmark'
  #           [sp+0x20]  (sp of caller)
  0x00007f96dd74bee0: cmp     0x8(%rsi),%rax
  0x00007f96dd74bee4: jne     0x7f96d5c99c60    ;   {runtime_call ic_miss_stub}
  0x00007f96dd74beea: nop
  0x00007f96dd74beec: nopl    0x0(%rax)
[Verified Entry Point]
  0x00007f96dd74bef0: mov     %eax,0xfffffffffffec000(%rsp)
  0x00007f96dd74bef7: push    %rbp
  0x00007f96dd74bef8: sub     $0x10,%rsp        ;*synchronization entry
                                                ; - org.sample.MyBenchmark::testMethod@-1 (line 66)

  0x00007f96dd74befc: mov     0x10(%rsi),%r11d  ;*getfield x {reexecute=0 rethrow=0 return_oop=0}
                                                ; - org.sample.MyBenchmark::testMethod@8 (line 67)

  0x00007f96dd74bf00: mov     0x18(%rsi),%r10
  0x00007f96dd74bf04: test    %r10,%r10
  0x00007f96dd74bf07: je      0x7f96dd74bf73    ;*invokevirtual test {reexecute=0 rethrow=0 return_oop=0}
                                                ; - org.sample.MyBenchmark::testMethod@11 (line 67)

  0x00007f96dd74bf09: xor     %eax,%eax
  0x00007f96dd74bf0b: test    %r11d,%r11d
  0x00007f96dd74bf0e: jle     0x7f96dd74bf67    ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
                                                ; - org.sample.A::test@8 (line 46)
                                                ; - org.sample.MyBenchmark::testMethod@11 (line 67)

  0x00007f96dd74bf10: xor     %r10d,%r10d
  0x00007f96dd74bf13: xor     %r9d,%r9d
  0x00007f96dd74bf16: xor     %r8d,%r8d
  0x00007f96dd74bf19: mov     $0x1,%edi         ;*ishl {reexecute=0 rethrow=0 return_oop=0}
                                                ; - org.sample.A::test@18 (line 47)
                                                ; - org.sample.MyBenchmark::testMethod@11 (line 67)

  0x00007f96dd74bf1e: movsxd  %edi,%rcx
  0x00007f96dd74bf21: add     %rcx,%r8          ;*ladd {reexecute=0 rethrow=0 return_oop=0}
                                                ; - org.sample.A::test@20 (line 47)
                                                ; - org.sample.MyBenchmark::testMethod@11 (line 67)

  0x00007f96dd74bf24: incl    %r9d              ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                                                ; - org.sample.A::test@22 (line 46)
                                                ; - org.sample.MyBenchmark::testMethod@11 (line 67)

  0x00007f96dd74bf27: cmp     %r11d,%r9d
  0x00007f96dd74bf2a: jnl     0x7f96dd74bf3b    ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
                                                ; - org.sample.A::test@8 (line 46)
                                                ; - org.sample.MyBenchmark::testMethod@11 (line 67)

  0x00007f96dd74bf2c: mov     %r9d,%ecx
  0x00007f96dd74bf2f: and     $0x1f,%ecx
  0x00007f96dd74bf32: mov     $0x1,%edi
  0x00007f96dd74bf37: shl     %cl,%edi          ;*ishl {reexecute=0 rethrow=0 return_oop=0}
                                                ; - org.sample.A::test@18 (line 47)
                                                ; - org.sample.MyBenchmark::testMethod@11 (line 67)

  0x00007f96dd74bf39: jmp     0x7f96dd74bf1e
  0x00007f96dd74bf3b: mov     $0x1,%r9d         ;*ishl {reexecute=0 rethrow=0 return_oop=0}
                                                ; - org.sample.A::test@18 (line 47)
                                                ; - org.sample.MyBenchmark::testMethod@25 (line 68)

  0x00007f96dd74bf41: movsxd  %r9d,%r9
  0x00007f96dd74bf44: add     %r9,%r10          ;*ladd {reexecute=0 rethrow=0 return_oop=0}
                                                ; - org.sample.A::test@20 (line 47)
                                                ; - org.sample.MyBenchmark::testMethod@25 (line 68)

  0x00007f96dd74bf47: incl    %eax              ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                                                ; - org.sample.A::test@22 (line 46)
                                                ; - org.sample.MyBenchmark::testMethod@25 (line 68)

  0x00007f96dd74bf49: cmp     %r11d,%eax
  0x00007f96dd74bf4c: jnl     0x7f96dd74bf5e    ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
                                                ; - org.sample.A::test@8 (line 46)
                                                ; - org.sample.MyBenchmark::testMethod@25 (line 68)

  0x00007f96dd74bf4e: mov     %eax,%ecx
  0x00007f96dd74bf50: and     $0x1f,%ecx
  0x00007f96dd74bf53: mov     $0x1,%r9d
  0x00007f96dd74bf59: shl     %cl,%r9d          ;*ishl {reexecute=0 rethrow=0 return_oop=0}
                                                ; - org.sample.A::test@18 (line 47)
                                                ; - org.sample.MyBenchmark::testMethod@25 (line 68)

  0x00007f96dd74bf5c: jmp     0x7f96dd74bf41
  0x00007f96dd74bf5e: mov     %r8d,%r11d
  0x00007f96dd74bf61: mov     %r10d,%eax
  0x00007f96dd74bf64: add     %r11d,%eax        ;*iadd {reexecute=0 rethrow=0 return_oop=0}
                                                ; - org.sample.MyBenchmark::testMethod@28 (line 68)

  0x00007f96dd74bf67: add     $0x10,%rsp
  0x00007f96dd74bf6b: pop     %rbp
  0x00007f96dd74bf6c: test    %eax,0x165a708e(%rip)  ;   {poll_return}
  0x00007f96dd74bf72: retq
  0x00007f96dd74bf73: mov     $0xfffffff6,%esi
  0x00007f96dd74bf78: mov     %r11d,%ebp
  0x00007f96dd74bf7b: callq   0x7f96d5c9b560    ; ImmutableOopMap{}
                                                ;*invokevirtual test {reexecute=0 rethrow=0 return_oop=0}
                                                ; - org.sample.MyBenchmark::testMethod@11 (line 67)
                                                ;   {runtime_call UncommonTrapBlob}
  0x00007f96dd74bf80: callq   0x7f96f2772aa0    ;*invokevirtual test {reexecute=0 rethrow=0 return_oop=0}
                                                ; - org.sample.MyBenchmark::testMethod@11 (line 67)
                                                ;   {runtime_call}
  0x00007f96dd74bf85: hlt
  0x00007f96dd74bf86: hlt
  0x00007f96dd74bf87: hlt
  0x00007f96dd74bf88: hlt
  0x00007f96dd74bf89: hlt
  0x00007f96dd74bf8a: hlt
  0x00007f96dd74bf8b: hlt
  0x00007f96dd74bf8c: hlt
  0x00007f96dd74bf8d: hlt
  0x00007f96dd74bf8e: hlt
  0x00007f96dd74bf8f: hlt
  0x00007f96dd74bf90: hlt
  0x00007f96dd74bf91: hlt
  0x00007f96dd74bf92: hlt
  0x00007f96dd74bf93: hlt
  0x00007f96dd74bf94: hlt
  0x00007f96dd74bf95: hlt
  0x00007f96dd74bf96: hlt
  0x00007f96dd74bf97: hlt
  0x00007f96dd74bf98: hlt
  0x00007f96dd74bf99: hlt
  0x00007f96dd74bf9a: hlt
  0x00007f96dd74bf9b: hlt
  0x00007f96dd74bf9c: hlt
  0x00007f96dd74bf9d: hlt
  0x00007f96dd74bf9e: hlt
  0x00007f96dd74bf9f: hlt
[Exception Handler]
[Stub Code]

If I understand the assembly correct, Common subexpression elimination (CSE) is not performed for a.test(x). I guess the direct reason is that different registers are used for the two calls, preventing (hindering) the JIT from doing CSE.

test is merely an example of pure method, that does nothing interesting. My intention is that test(x) is expensive enough for some x so that CSE would be beneficial.

I wonder if there's any way to enable CSE explicitly in this case or other similar scenarios. Or how CSE worsk in JIT in JVM.

ENV:

openjdk version "9-internal" OpenJDK Runtime Environment (build 9-internal+0-2016-04-14-195246.buildd.src) OpenJDK 64-Bit Server VM (build 9-internal+0-2016-04-14-195246.buildd.src, mixed mode)

java
jvm
jit
asked on Stack Overflow Jan 10, 2018 by Albert Netymk

1 Answer

2

To begin with, you look at the wrong method. This is not what is executed on the hot path. MyBenchmark.testMethod is inlined into the benchmark loop, and the most time is spent in a JMH-generated method like

org.sample.generated.MyBenchmark_testMethod_jmhTest::testMethod_avgt_jmhStub

You may check it by running JMH with -prof perfasm option.


Anyway, you've guessed right that CSE does not work in the given example. But not because of different registers (register allocation is performed late after most machine-independent optimizations), but just because the control flow is too complicated here. Generally, CSE in HotSpot is not applied on subgraphs with cycles (i.e. loops).

I wonder if there's any way to enable CSE explicitly in this case or other similar scenarios.

Well, of course, by doing this manually, i.e. by caching the method result in a temporary variable. HotSpot is not too smart in detecting common subexpressions: for example, s*y+x and y*s+x are treated as different expressions, but you may help JIT by rewriting the code like in this question.

answered on Stack Overflow Jan 11, 2018 by apangin

User contributions licensed under CC BY-SA 3.0