Testing the following snippet using jmh.
class A {
public int test(int i) {
long sum = 0L;
for (int j=0; j<i; ++j) {
sum += 1 << (j % 32);
}
return (int)sum;
}
}
@State(Scope.Thread)
public class MyBenchmark {
public A a;
int x;
@Setup(Level.Trial)
public void init() {
a = new A();
x = 1;
}
@Benchmark
public int testMethod() {
int res = 0;
res += a.test(x);
res += a.test(x);
return res;
}
}
Build and run the example using mvn package && java -XX:-UseCompressedOops -XX:CompileCommand='print, *.testMethod' -jar target/benchmarks.jar -wi 10 -i 1 -f 1
to get the following assembly (only showing the assembly for the final compilation).
ImmutableOopMap{}pc offsets: 762 772 800 Compiled method (c2) 397 551 4 org.sample.MyBenchmark::testMethod (32 bytes)
total in heap [0x00007f96dd74bd90,0x00007f96dd74c170] = 992
relocation [0x00007f96dd74bed0,0x00007f96dd74bee0] = 16
main code [0x00007f96dd74bee0,0x00007f96dd74bfa0] = 192
stub code [0x00007f96dd74bfa0,0x00007f96dd74bfb8] = 24
oops [0x00007f96dd74bfb8,0x00007f96dd74bfc0] = 8
metadata [0x00007f96dd74bfc0,0x00007f96dd74bfd0] = 16
scopes data [0x00007f96dd74bfd0,0x00007f96dd74c038] = 104
scopes pcs [0x00007f96dd74c038,0x00007f96dd74c168] = 304
dependencies [0x00007f96dd74c168,0x00007f96dd74c170] = 8
----------------------------------------------------------------------
org/sample/MyBenchmark.testMethod()I [0x00007f96dd74bee0, 0x00007f96dd74bfb8] 216 bytes
Argument 0 is unknown.RIP: 0x7f96dd74bee0 Code size: 0x000000d8
[Entry Point]
[Constants]
# {method} {0x00007f95ed0efe80} 'testMethod' '()I' in 'org/sample/MyBenchmark'
# [sp+0x20] (sp of caller)
0x00007f96dd74bee0: cmp 0x8(%rsi),%rax
0x00007f96dd74bee4: jne 0x7f96d5c99c60 ; {runtime_call ic_miss_stub}
0x00007f96dd74beea: nop
0x00007f96dd74beec: nopl 0x0(%rax)
[Verified Entry Point]
0x00007f96dd74bef0: mov %eax,0xfffffffffffec000(%rsp)
0x00007f96dd74bef7: push %rbp
0x00007f96dd74bef8: sub $0x10,%rsp ;*synchronization entry
; - org.sample.MyBenchmark::testMethod@-1 (line 66)
0x00007f96dd74befc: mov 0x10(%rsi),%r11d ;*getfield x {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.MyBenchmark::testMethod@8 (line 67)
0x00007f96dd74bf00: mov 0x18(%rsi),%r10
0x00007f96dd74bf04: test %r10,%r10
0x00007f96dd74bf07: je 0x7f96dd74bf73 ;*invokevirtual test {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.MyBenchmark::testMethod@11 (line 67)
0x00007f96dd74bf09: xor %eax,%eax
0x00007f96dd74bf0b: test %r11d,%r11d
0x00007f96dd74bf0e: jle 0x7f96dd74bf67 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.A::test@8 (line 46)
; - org.sample.MyBenchmark::testMethod@11 (line 67)
0x00007f96dd74bf10: xor %r10d,%r10d
0x00007f96dd74bf13: xor %r9d,%r9d
0x00007f96dd74bf16: xor %r8d,%r8d
0x00007f96dd74bf19: mov $0x1,%edi ;*ishl {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.A::test@18 (line 47)
; - org.sample.MyBenchmark::testMethod@11 (line 67)
0x00007f96dd74bf1e: movsxd %edi,%rcx
0x00007f96dd74bf21: add %rcx,%r8 ;*ladd {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.A::test@20 (line 47)
; - org.sample.MyBenchmark::testMethod@11 (line 67)
0x00007f96dd74bf24: incl %r9d ;*iinc {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.A::test@22 (line 46)
; - org.sample.MyBenchmark::testMethod@11 (line 67)
0x00007f96dd74bf27: cmp %r11d,%r9d
0x00007f96dd74bf2a: jnl 0x7f96dd74bf3b ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.A::test@8 (line 46)
; - org.sample.MyBenchmark::testMethod@11 (line 67)
0x00007f96dd74bf2c: mov %r9d,%ecx
0x00007f96dd74bf2f: and $0x1f,%ecx
0x00007f96dd74bf32: mov $0x1,%edi
0x00007f96dd74bf37: shl %cl,%edi ;*ishl {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.A::test@18 (line 47)
; - org.sample.MyBenchmark::testMethod@11 (line 67)
0x00007f96dd74bf39: jmp 0x7f96dd74bf1e
0x00007f96dd74bf3b: mov $0x1,%r9d ;*ishl {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.A::test@18 (line 47)
; - org.sample.MyBenchmark::testMethod@25 (line 68)
0x00007f96dd74bf41: movsxd %r9d,%r9
0x00007f96dd74bf44: add %r9,%r10 ;*ladd {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.A::test@20 (line 47)
; - org.sample.MyBenchmark::testMethod@25 (line 68)
0x00007f96dd74bf47: incl %eax ;*iinc {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.A::test@22 (line 46)
; - org.sample.MyBenchmark::testMethod@25 (line 68)
0x00007f96dd74bf49: cmp %r11d,%eax
0x00007f96dd74bf4c: jnl 0x7f96dd74bf5e ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.A::test@8 (line 46)
; - org.sample.MyBenchmark::testMethod@25 (line 68)
0x00007f96dd74bf4e: mov %eax,%ecx
0x00007f96dd74bf50: and $0x1f,%ecx
0x00007f96dd74bf53: mov $0x1,%r9d
0x00007f96dd74bf59: shl %cl,%r9d ;*ishl {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.A::test@18 (line 47)
; - org.sample.MyBenchmark::testMethod@25 (line 68)
0x00007f96dd74bf5c: jmp 0x7f96dd74bf41
0x00007f96dd74bf5e: mov %r8d,%r11d
0x00007f96dd74bf61: mov %r10d,%eax
0x00007f96dd74bf64: add %r11d,%eax ;*iadd {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.MyBenchmark::testMethod@28 (line 68)
0x00007f96dd74bf67: add $0x10,%rsp
0x00007f96dd74bf6b: pop %rbp
0x00007f96dd74bf6c: test %eax,0x165a708e(%rip) ; {poll_return}
0x00007f96dd74bf72: retq
0x00007f96dd74bf73: mov $0xfffffff6,%esi
0x00007f96dd74bf78: mov %r11d,%ebp
0x00007f96dd74bf7b: callq 0x7f96d5c9b560 ; ImmutableOopMap{}
;*invokevirtual test {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.MyBenchmark::testMethod@11 (line 67)
; {runtime_call UncommonTrapBlob}
0x00007f96dd74bf80: callq 0x7f96f2772aa0 ;*invokevirtual test {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.MyBenchmark::testMethod@11 (line 67)
; {runtime_call}
0x00007f96dd74bf85: hlt
0x00007f96dd74bf86: hlt
0x00007f96dd74bf87: hlt
0x00007f96dd74bf88: hlt
0x00007f96dd74bf89: hlt
0x00007f96dd74bf8a: hlt
0x00007f96dd74bf8b: hlt
0x00007f96dd74bf8c: hlt
0x00007f96dd74bf8d: hlt
0x00007f96dd74bf8e: hlt
0x00007f96dd74bf8f: hlt
0x00007f96dd74bf90: hlt
0x00007f96dd74bf91: hlt
0x00007f96dd74bf92: hlt
0x00007f96dd74bf93: hlt
0x00007f96dd74bf94: hlt
0x00007f96dd74bf95: hlt
0x00007f96dd74bf96: hlt
0x00007f96dd74bf97: hlt
0x00007f96dd74bf98: hlt
0x00007f96dd74bf99: hlt
0x00007f96dd74bf9a: hlt
0x00007f96dd74bf9b: hlt
0x00007f96dd74bf9c: hlt
0x00007f96dd74bf9d: hlt
0x00007f96dd74bf9e: hlt
0x00007f96dd74bf9f: hlt
[Exception Handler]
[Stub Code]
If I understand the assembly correct, Common subexpression elimination (CSE) is not performed for a.test(x)
. I guess the direct reason is that different registers are used for the two calls, preventing (hindering) the JIT from doing CSE.
test
is merely an example of pure method, that does nothing interesting. My intention is that test(x)
is expensive enough for some x
so that CSE would be beneficial.
I wonder if there's any way to enable CSE explicitly in this case or other similar scenarios. Or how CSE worsk in JIT in JVM.
openjdk version "9-internal" OpenJDK Runtime Environment (build 9-internal+0-2016-04-14-195246.buildd.src) OpenJDK 64-Bit Server VM (build 9-internal+0-2016-04-14-195246.buildd.src, mixed mode)
To begin with, you look at the wrong method. This is not what is executed on the hot path. MyBenchmark.testMethod
is inlined into the benchmark loop, and the most time is spent in a JMH-generated method like
org.sample.generated.MyBenchmark_testMethod_jmhTest::testMethod_avgt_jmhStub
You may check it by running JMH with -prof perfasm
option.
Anyway, you've guessed right that CSE does not work in the given example. But not because of different registers (register allocation is performed late after most machine-independent optimizations), but just because the control flow is too complicated here. Generally, CSE in HotSpot is not applied on subgraphs with cycles (i.e. loops).
I wonder if there's any way to enable CSE explicitly in this case or other similar scenarios.
Well, of course, by doing this manually, i.e. by caching the method result in a temporary variable. HotSpot is not too smart in detecting common subexpressions: for example, s*y+x
and y*s+x
are treated as different expressions, but you may help JIT by rewriting the code like in this question.
User contributions licensed under CC BY-SA 3.0