Strange content when debugging some Armv5 assembly code

0

I am trying to learn ARM by debugging a simple piece of ARM assembly.

    .global start, stack_top
start:
    ldr sp, =stack_top
    bl main
    b .

The linker script looks like below:

ENTRY(start)
SECTIONS
{
    . = 0x10000;
    .text : {*(.text)}
    .data : {*(.data)}
    .bss : {*(.bss)}
    . = ALIGN(8);
    . = . +0x1000;
    stack_top = .;
}

I run this on qemu arm emulator. The binary is loaded at 0x10000. So I put a breakpoint there. As soon as the bp is hit. I checked the pc register. It's value is 0x10000. Then I disassemble the instruction at 0x10000.

I see a strange comment ; 0x1000c <start+12>. What does it mean? Where does it come from?

Breakpoint 1, 0x00010000 in start ()
(gdb) i r pc
pc             0x10000  0x10000 <start>
(gdb) x /i 0x10000
=> 0x10000 <start>:     ldr     sp, [pc, #4]    ; 0x1000c <start+12> <========= HERE
(gdb) x /i 0x10004
   0x10004 <start+4>:   bl      0x102b0 <main>

Then I continued to debug: I want to see the effect of the ldr sp, [pc, #4] at 0x10000 on the sp register. So I debug as below.

From the above disassembly, I expected the value of sp to be [pc + 4], which should be the content located at 0x10000 + 4 = 0x10004. But the sp turns out to be 0x11520.

(gdb) i r sp
sp             0x0      0x0
(gdb) si
0x00010004 in start ()
(gdb) x /i $pc
=> 0x10004 <start+4>:   bl      0x102b0 <main>
(gdb) i r sp
sp             0x11520  0x11520 <=================== HERE
(gdb) x /x &stack_top  
0x11520:        0x00000000

So the 0x11520 value does come from the linker script symbol stack_top. But how is it related to the ldr sp, [pc,#4] instruction at 0x10000?

ADD 1 - 9:29 AM 12/20/2019

Many thanks for the detailed answer by @old_timer.

I was reading the book Embedded and Real-Time Operating Systems by K. C. Wang. I learned about the pipeline thing from this book. Quoted as below:

ARM instruction pipeline

So, if the pipeline thing is no longer relevant today. What reason makes the pc value 2 ahead of the currently executed instruction?

I just found below thread addressing this issue:

Why does the ARM PC register point to the instruction after the next one to be executed?

Basically, it just another case that people keep making mistakes/flaws/pitfalls for themselves as they advance the technologies.

So back to this question:

  • In my assembly, it is pc-relative addressing being used.
  • ARM's PC pointer is 2 ahead of the currently executed instruction. (And deal with that!)
arm
gdb
asked on Stack Overflow Dec 19, 2019 by smwikipedia • edited Dec 20, 2019 by smwikipedia

3 Answers

1

When accessing the pc from an instruction (e.g. ldr or mov), an offset of 8 is added in ARM (A32) mode, and an offset of 4 in Thumb (T32) mode. IIRC this is because of the way function calls worked in old ARM versions. This is documented e.g. in the ARMv7A Architecture Reference Manual in chapter A2.3, on p. A2-45.

The comment ; 0x1000c <start+12> is indeed generated by the disassembler, to indicate the address calculated by PC+4.

Side note: ldr <register>, =<value> is not an actual instruction, but translated by the assembler into 1-2 instructions and optionally a literal value to obtain the desired value in the most efficient way.

If you are interested in that, I wrote a tutorial on learning ARM assembly step-by-step on Cortex-M.

answered on Stack Overflow Dec 19, 2019 by Erlkoenig
1
    .global start, stack_top
start:
    ldr sp, =stack_top
    bl main
    b .

assuming arm mode you have three instructions there, the first possible pool for the stack_top value to live is after the .b

_start: ( 0x00000000 )
0x00000000  ldr sp,=stack_top
0x00000004  bl main
0x00000008  b .
0x0000000c  stack_top

and from what you have shown this is where the assembler allocated that space.

so at _start + 12 is the location of the stack_top VALUE. The pseudo code ldr sp,=stack_top either gets turned into a mov or a pc relative load. The pc is two ahead for historical reasons which have zero relevance today, some architectures the pc is the current instruction, some it is the address at the next instruction variable length or not, and in the case of arm (aarch32) and thumb it is "two ahead" so 8. So a pc-relative load for an instruction at address 0x00000000 to reach 0x0000000C is 0xC - 8 = 4. so ldr sp,[pc,#4].

Now the CONTENTS at that address is as you asked in the linker script computed by the linker at link time. You put some code in there then padded some stuff didnt show the rest of your code, could have made this a complete example, but either way from your post the linker ended up computing 0x11520

so reverse engineering your question and comments we see that the binary starts with (once linked)

_start: ( 0x00010000 )
0x00010000  ldr sp,[pc, #4]
0x00010004  bl main
0x00010008  b .
0x0001000c  0x11520

In arm mode, so the first instruction will load the value 0x11520 into the stack pointer as you asked. Nothing strange or wrong here.

The 0x1000C <_start + 12> is simply stating that the address 0x1000C is an offset of 12 away from the nearest label _start. Sometimes that is useful information.

Using the pseudo instruction and not defining a pool the assembler is going to attempt to find a home if you added a nop or some other code

    .global start, stack_top
start:
    ldr sp, =stack_top
    bl main
    nop
    b .

Then it is likely the assembler would now put that at pc + 8 which after being linked would be 0x10010 and if nothing else changes the stack pointer MIGHT be at the same value or 4 (or more) further along, depends on alignments and padding made by the tool along the way.

The point being the pipe no longer works that way if it ever did in real products so dont think of this as a pipe thing any more than the branch shadow instructions in mips mean anything relevant today (when enabled). For every instruction set that has pc-relative addressing you need to know the rule, is it the address of the instruction (less common), the address of the next instruction (most common) or two ahead, or other...Likewise folks for a while hardcoded in their brain 8 bytes ahead, rather than two ahead, and when they switched to thumb had issues. Now of course there are the thumb2 extensions which hose thinking about two ahead. I dont off hand know the aarch64 rule, I would hope it is next instruction and not infected with the two ahead from aarch32. But as with arm (A32) and thumb (T16 and T32) it is easy to find this information in the arm documentation (which as a rule for any architecture you should have handy when writing or analyzing machine/assembly language)

answered on Stack Overflow Dec 19, 2019 by old_timer
0

(I think I can explain it now. If I am wrong, please feel free to correct me.)

I tried a slightly different assembly with one more label. Shown as below:

    .global start, stack_top, label2 ;<========== HERE I add a new label2
start:
    ldr sp, =stack_top // sp = &stack_top, as soon as we have the stack ready, we can call C function
label2:    
    bl main
    b .

The new debug session is like this:

Breakpoint 1, 0x00010000 in start ()
(gdb) i r pc
pc             0x10000  0x10000 <start>
(gdb) x /i $pc  <======== (1)
=> 0x10000 <start>:     ldr     sp, [pc, #4]    ; 0x1000c <label2+8> <======= (2)
(gdb) i r sp
sp             0x0      0x0
(gdb) si
0x00010004 in label2 ()
(gdb) x /i $pc
=> 0x10004 <label2>:    bl      0x102b0 <main>
(gdb) i r sp
sp             0x11520  0x11520
(gdb) x /x 0x1000c    <========== (3)
0x1000c <label2+8>:     0x00011520
(gdb) x /x &stack_top <========== (4)
0x11520:        0x00000000

Though at line (1), I seem to be asking for the pc value, and at line (2) it does gives me a value 0x10000, it is actually NOT the real pc value at that moment.

Because ARM processor has a fetch-decode-execution pipeline. When one instruction is being executed, 2 more instructions ahead are being fetched/decoded.

So pc actually points to the fetched instruction. The currently executed instruction at 0x10000 is actually pc-8 since I am using ARM mode instruction and each instruction takes 4 bytes. So the actual pc value is 0x10008.

So [pc, #4] gives 0x10008 + 4 = 0x1000C which is just what the comment ; 0x1000c <label2+8> says. (This is pc-relative addressing by the way, please read @old_timer's answer for more details about it).

It seems gdb chooses to use the nearest label to represent the address calculation result. So it choose label2. In my original question, it chooses start.

And line (3) and (4) confirm that memory location at 0x1000c does hold the stack_top value.

So to summarize, below 2 things should be noted:

  • ARM instruction pipeline
  • GDB convenient display in the form of comment for the address calculation result in an instruction

Last thought...

BTW, I think when I dump the pc value at line (1), it would be much better if the real pc value for the fetched instruction can be displayed, i.e 0x10008. That can avoid much confusion.


More thought...

Please read below thread for why pc is 2 ahead of the currently executed instruction.

Why does the ARM PC register point to the instruction after the next one to be executed?

Though the 3-stage fetch-decode-execute pipeline is no longer relevant (thanks to @old_timer), the calculation in above answer is still mathematically valid. And other parts are valid as well.

answered on Stack Overflow Dec 19, 2019 by smwikipedia • edited Dec 20, 2019 by smwikipedia

User contributions licensed under CC BY-SA 3.0