Understanding gcc behaviour regarding AAPCS (on STM32)

2

EDIT: I am fully aware that the function asmCopy might be not functionnal, my question is more about the behaviour of gcc regarding parameters passing in registers.

I'm working on STM32H7 using STM32CubeIDE whose builder is arm-none-eabi-gcc

The optimisation level is -Os

I see the following behaviour that I cannot explain. I took screen capture to get in parallel asm and C code.

My C code is calling 3 functions. The first and the third one have exactly the same parameters.

The second one takes no parameters. here is its code:

static void Reset_Cycle_Counter(void)
{
    volatile unsigned long *DWT_CYCCNT = (unsigned long *)0xE0001004;
    volatile unsigned long *DWT_CONTROL = (uint32_t *)0xE0001000;

    // Reset cycle counter
    *DWT_CONTROL = *DWT_CONTROL & ~0x00000001 ;
    *DWT_CYCCNT = 0;
    *DWT_CONTROL = *DWT_CONTROL | 1 ;
}

The third function is particular: I am trying to write some assembly code (that may very well be wrong right now).

static void __attribute__((noinline)) asmCopy(void *dst, void *src, uint32_t bytes)
{
    while (bytes--)
    {
        asm("ldrb r12,[r1], #1"); // src param is stored in r1, r12 can be modified without being restored after
        asm("strb r12,[r0], #1"); // dst paramis stored in r0
    }
}

Before the first function call (to memcpy), r0, r1 and r2 are loaded with the right values.

enter image description here

Then before call to the third function, as you can see below the parameters in r1 and r2 are wrong (qspi_addr should be 0x90000000). enter image description here

My understanding of AAPCS (procedure call standard on ARM) is that before calling a subroutine, the registers r0 to r3 should be loaded with the parameters of the functions (if any). And the subroutine does not need to preserve or restore these registers. It is then normal that the second function modifies r1 and r2. So I would expect the compiler to update r0, r1 and r2 before the third call.

If I change the optimisation code to -O0, I indeed get this expected behaviour.

What do you think ?

gcc
arm
stm32
asked on Stack Overflow Jun 26, 2019 by Guillaume Petitjean • edited Jun 26, 2019 by Guillaume Petitjean

3 Answers

3

You can't just open an inline assembly block and assume that r0 and r1 still contain the function arguments. There is no guarantee for that whatsoever. If you need to use the arguments you need to pass them properly as input and or output operands

static void __attribute__((noinline))
myAsmCopy(void* dst, void* src, uint32_t bytes) {
  asm volatile("1: cbz %[bytes], 1f \n"
               "ldrb r12, [%[src]], #1 \n"
               "strb r12, [%[dst]], #1 \n"
               "subs %[bytes], #1 \n"
               "b 1b \n"
               "1: \n"
               : [dst] "+&r"(dst), [src] "+&r"(src), [bytes] "+&r"(bytes)
               :
               : "cc", "memory", "r12");
}

GCC has some extensive documentation about inline assembly here: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

As you've obviously never used any of that before I must heavily advice against it. If "C contains footguns" then inline assembly is putting a 6-shot revolver with 5 bullets to your head.

answered on Stack Overflow Jun 26, 2019 by Vinci
0

If you try to ask the compiler how to archive it everything is getting much easier

https://godbolt.org/z/rXxeRe

void __attribute__((noinline)) asmCopy(void *dst, void *src, uint32_t bytes)
{
    while (bytes--)
    {
        asm("ldrb r12,[r1], #1"); // src param is stored in r1, r12 can be modified without being restored after
        asm("strb r12,[r0], #1"); // dst paramis stored in r0
    }
}

void __attribute__((noinline)) asmCopy1(void *dst, void *src, uint32_t bytes)
{
    while (bytes--)
    {
        *(uint8_t *)dst++ = *(uint8_t *)src++;
    }
}

and the code

asmCopy:
.L2:
        adds    r2, r2, #-1
        bcs     .L3
        bx      lr
.L3:
        ldrb r12,[r1], #1
        strb r12,[r0], #1
        b       .L2
asmCopy1:
        subs    r0, r0, #1
        add     r2, r2, r1
.L5:
        cmp     r1, r2
        bne     .L6
        bx      lr
.L6:
        ldrb    r3, [r1], #1    @ zero_extendqisi2
        strb    r3, [r0, #1]!
        b       .L5
answered on Stack Overflow Jun 26, 2019 by 0___________
0

I think I've found the answer.

In the function I am testing (whether it is the crappy one I've implemented, or the better one from @Vinci) some parameters passed to the function are global variables (arrays of dummy data to run some tests).

My understanding is that the compiler "modifies" the prototype of the function to build a function taking only one parameter. The other parameters are considered as constants and just PC relatively loaded at the beginning of the function.

So I modified the code to call the very same function but with local volatile pointers and the issue disappears: I can see registers r0,r1 and r2 loaded with the parameters as I expected.

Does it make sense ?

answered on Stack Overflow Jun 27, 2019 by Guillaume Petitjean

User contributions licensed under CC BY-SA 3.0