Switch case's jump table position within code on ARM


In C/C++ a switch statement can be lowered by the compiler to a jump table. I noticed a difference of placement of the jump table between ARM and x86.


For x86 (And x86_64) the jump table is often placed outside of the function (e.g. .rodata)

  4005e0:       48 8b 45 d8             mov    -0x28(%rbp),%rax
  4005e4:       48 8b 0c c5 b0 0c 40    mov    0x400cb0(,%rax,8),%rcx
  4005eb:       00 
  4005ec:       ff e1                   jmpq   *%rcx
  4005ee:       8b 45 e8                mov    -0x18(%rbp),%eax
  4005f1:       83 e8 66                sub    $0x66,%eax


For ARM the jump table is interleaved with function's code.

 15c:   e28f2004        add     r2, pc, #4
 160:   e7911002        ldr     r1, [r1, r2]
 164:   e1a0f001        mov     pc, r1
 168:   000001a4        .word   0x000001a4
 16c:   000001b4        .word   0x000001b4
 170:   000001e4        .word   0x000001e4
 174:   00000214        .word   0x00000214
 178:   00000214        .word   0x00000214
 17c:   00000214        .word   0x00000214
 180:   00000214        .word   0x00000214
 184:   00000214        .word   0x00000214
 188:   000001c4        .word   0x000001c4
 18c:   000001f4        .word   0x000001f4

The above code was generated with clang 3.5 -target arm-none-eabi -march=armv7, but similar code is generated with gcc.


For completeness, here's the code for a switch statement on MIPS. The jump table is placed in the .rodata section.

 4002b8:    2c85000b    sltiu   a1,a0,11       
 4002bc:    afc40018    sw  a0,24(s8)       //local var that we switch on 
 4002c0:    10a00021    beqz    a1,400348 <main0+0xb4> // default case
 4002c4:    00000000    nop
 4002c8:    8fc10018    lw  at,24(s8)      //the var that we switch on is in at
 4002cc:    00011080    sll v0,at,0x2      // v0 = at<<2
 4002d0:    3c030040    lui v1,0x40        // v1 = 0x40<<16
 4002d4:    00431021    addu    v0,v0,v1   // v0 = (at<<2) + v1 
 4002d8:    8c421848    lw  v0,6216(v0)    // v0 = *((at<<2)+0x401848)
 4002dc:    00400008    jr  v0             // jump
 4002e0:    00000000    nop

The address of the jumptable (0x00401848) is in .rodata.

 $ readelf -e /tmp/muti-sw.mips.o  | grep .rodata
 [ 7] .rodata           PROGBITS        00401848 001848 00069a 00   A  0   0  4

The above code was generated with clang 3.9.


Why on ARM architecture the jump table is often interleaved with function's code and on x86 not?

This answer implies that the way the cache is working on ARM has to do with it. Are there any other reasons?

asked on Stack Overflow Jun 1, 2016 by cojocar • edited May 23, 2017 by Community

1 Answer


Mainly this has to do with RISC vs CISC philosophy. On the ARM the PC is almost a general purpose register. You can see that with add r2, pc, #4; this puts the address of the table in r2. Since the table is loaded via the PC, it needs to go with the code. A simpler switch is possible,

     ldr     r1, [r1, pc]  ; get table data via 'pc'
     add     pc, r1        ; do switch
     .word offset_first_case ; ... etc.

The above is completely PC relative. It looks like your code might need a relocation. If the case code is highly symmetric, a table might not even be needed just pc += case * case_code_size.

Some ARM CPUs support instructions like xlat and a switch/case implementation may depend on the compiler, target ARM/x86 CPU, the data type and the density of the cases. For instance, the table might contain 'case,case_offset' and be sorted so a binary search is performed in the 'sparse case' case.

Note: The ARM pc is two instructions (eight bytes) ahead due to the original ARM pipeline size. ARM maintains this offset when using the PC in order to stay compatible.

answered on Stack Overflow Jun 1, 2016 by artless noise • edited Jun 1, 2016 by artless noise

User contributions licensed under CC BY-SA 3.0