EDIT: I am fully aware that the function asmCopy might be not functionnal, my question is more about the behaviour of gcc regarding parameters passing in registers.
I'm working on STM32H7 using STM32CubeIDE whose builder is arm-none-eabi-gcc
The optimisation level is -Os
I see the following behaviour that I cannot explain. I took screen capture to get in parallel asm and C code.
My C code is calling 3 functions. The first and the third one have exactly the same parameters.
The second one takes no parameters. here is its code:
static void Reset_Cycle_Counter(void)
{
volatile unsigned long *DWT_CYCCNT = (unsigned long *)0xE0001004;
volatile unsigned long *DWT_CONTROL = (uint32_t *)0xE0001000;
// Reset cycle counter
*DWT_CONTROL = *DWT_CONTROL & ~0x00000001 ;
*DWT_CYCCNT = 0;
*DWT_CONTROL = *DWT_CONTROL | 1 ;
}
The third function is particular: I am trying to write some assembly code (that may very well be wrong right now).
static void __attribute__((noinline)) asmCopy(void *dst, void *src, uint32_t bytes)
{
while (bytes--)
{
asm("ldrb r12,[r1], #1"); // src param is stored in r1, r12 can be modified without being restored after
asm("strb r12,[r0], #1"); // dst paramis stored in r0
}
}
Before the first function call (to memcpy), r0, r1 and r2 are loaded with the right values.
Then before call to the third function, as you can see below the parameters in r1 and r2 are wrong (qspi_addr should be 0x90000000).
My understanding of AAPCS (procedure call standard on ARM) is that before calling a subroutine, the registers r0 to r3 should be loaded with the parameters of the functions (if any). And the subroutine does not need to preserve or restore these registers. It is then normal that the second function modifies r1 and r2. So I would expect the compiler to update r0, r1 and r2 before the third call.
If I change the optimisation code to -O0, I indeed get this expected behaviour.
What do you think ?
You can't just open an inline assembly block and assume that r0 and r1 still contain the function arguments. There is no guarantee for that whatsoever. If you need to use the arguments you need to pass them properly as input and or output operands
static void __attribute__((noinline))
myAsmCopy(void* dst, void* src, uint32_t bytes) {
asm volatile("1: cbz %[bytes], 1f \n"
"ldrb r12, [%[src]], #1 \n"
"strb r12, [%[dst]], #1 \n"
"subs %[bytes], #1 \n"
"b 1b \n"
"1: \n"
: [dst] "+&r"(dst), [src] "+&r"(src), [bytes] "+&r"(bytes)
:
: "cc", "memory", "r12");
}
GCC has some extensive documentation about inline assembly here: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
As you've obviously never used any of that before I must heavily advice against it. If "C contains footguns" then inline assembly is putting a 6-shot revolver with 5 bullets to your head.
If you try to ask the compiler how to archive it everything is getting much easier
void __attribute__((noinline)) asmCopy(void *dst, void *src, uint32_t bytes)
{
while (bytes--)
{
asm("ldrb r12,[r1], #1"); // src param is stored in r1, r12 can be modified without being restored after
asm("strb r12,[r0], #1"); // dst paramis stored in r0
}
}
void __attribute__((noinline)) asmCopy1(void *dst, void *src, uint32_t bytes)
{
while (bytes--)
{
*(uint8_t *)dst++ = *(uint8_t *)src++;
}
}
and the code
asmCopy:
.L2:
adds r2, r2, #-1
bcs .L3
bx lr
.L3:
ldrb r12,[r1], #1
strb r12,[r0], #1
b .L2
asmCopy1:
subs r0, r0, #1
add r2, r2, r1
.L5:
cmp r1, r2
bne .L6
bx lr
.L6:
ldrb r3, [r1], #1 @ zero_extendqisi2
strb r3, [r0, #1]!
b .L5
I think I've found the answer.
In the function I am testing (whether it is the crappy one I've implemented, or the better one from @Vinci) some parameters passed to the function are global variables (arrays of dummy data to run some tests).
My understanding is that the compiler "modifies" the prototype of the function to build a function taking only one parameter. The other parameters are considered as constants and just PC relatively loaded at the beginning of the function.
So I modified the code to call the very same function but with local volatile pointers and the issue disappears: I can see registers r0,r1 and r2 loaded with the parameters as I expected.
Does it make sense ?
User contributions licensed under CC BY-SA 3.0