In C/C++ a switch statement can be lowered by the compiler to a jump table. I noticed a difference of placement of the jump table between ARM and x86.
For x86 (And x86_64) the jump table is often placed outside of the function (e.g. .rodata)
4005e0: 48 8b 45 d8 mov -0x28(%rbp),%rax
4005e4: 48 8b 0c c5 b0 0c 40 mov 0x400cb0(,%rax,8),%rcx
4005eb: 00
4005ec: ff e1 jmpq *%rcx
4005ee: 8b 45 e8 mov -0x18(%rbp),%eax
4005f1: 83 e8 66 sub $0x66,%eax
For ARM the jump table is interleaved with function's code.
15c: e28f2004 add r2, pc, #4
160: e7911002 ldr r1, [r1, r2]
164: e1a0f001 mov pc, r1
168: 000001a4 .word 0x000001a4
16c: 000001b4 .word 0x000001b4
170: 000001e4 .word 0x000001e4
174: 00000214 .word 0x00000214
178: 00000214 .word 0x00000214
17c: 00000214 .word 0x00000214
180: 00000214 .word 0x00000214
184: 00000214 .word 0x00000214
188: 000001c4 .word 0x000001c4
18c: 000001f4 .word 0x000001f4
The above code was generated with clang 3.5 -target arm-none-eabi -march=armv7
, but similar code is generated with gcc
.
For completeness, here's the code for a switch statement on MIPS. The jump table is placed in the .rodata
section.
4002b8: 2c85000b sltiu a1,a0,11
4002bc: afc40018 sw a0,24(s8) //local var that we switch on
4002c0: 10a00021 beqz a1,400348 <main0+0xb4> // default case
4002c4: 00000000 nop
4002c8: 8fc10018 lw at,24(s8) //the var that we switch on is in at
4002cc: 00011080 sll v0,at,0x2 // v0 = at<<2
4002d0: 3c030040 lui v1,0x40 // v1 = 0x40<<16
4002d4: 00431021 addu v0,v0,v1 // v0 = (at<<2) + v1
4002d8: 8c421848 lw v0,6216(v0) // v0 = *((at<<2)+0x401848)
4002dc: 00400008 jr v0 // jump
4002e0: 00000000 nop
The address of the jumptable (0x00401848
) is in .rodata
.
$ readelf -e /tmp/muti-sw.mips.o | grep .rodata
[ 7] .rodata PROGBITS 00401848 001848 00069a 00 A 0 0 4
The above code was generated with clang 3.9.
Why on ARM architecture the jump table is often interleaved with function's code and on x86 not?
This answer implies that the way the cache is working on ARM has to do with it. Are there any other reasons?
Mainly this has to do with RISC vs CISC philosophy. On the ARM the PC is almost a general purpose register. You can see that with add r2, pc, #4
; this puts the address of the table in r2
. Since the table is loaded via the PC, it needs to go with the code. A simpler switch is possible,
ldr r1, [r1, pc] ; get table data via 'pc'
add pc, r1 ; do switch
table:
.word offset_first_case ; ... etc.
The above is completely PC
relative. It looks like your code might need a relocation. If the case code is highly symmetric, a table might not even be needed just pc += case * case_code_size
.
Some ARM CPUs support instructions like xlat
and a switch/case implementation may depend on the compiler, target ARM/x86 CPU, the data type and the density of the cases. For instance, the table might contain 'case,case_offset' and be sorted so a binary search is performed in the 'sparse case' case.
Note: The ARM pc
is two instructions (eight bytes) ahead due to the original ARM pipeline size. ARM maintains this offset when using the PC
in order to stay compatible.
User contributions licensed under CC BY-SA 3.0