I've written a bootloader for my embedded project, and it works in debug, and also in debug with optimizations on (-Os
). However, entering the main firmware in release mode (-Os
, NDEBUG
etc) fails, the call ends up with a hard fault. To as far as I can trust my debugger with all optimizations on, it appears that the jump to the entry point instead ends up jumping to 0x00000000
. Interestingly, if I just add a check that the entry point is not zero, everything works, so this is a Heisenbug.
Compiler is arm-none-eabi-gcc.
The main firmware entry point is
//Defined by the linker scripts
//Constants
extern uint32_t __MAIN_APP_BEGIN;
using jump_ptr_t = void (*)(uint32_t);
...snip...
void Image::run_image(uint32_t param) const {
//Access the main app vector table
const uint32_t *main_vectors = &__MAIN_APP_BEGIN;
//First entry is the initial stack pointer
const auto stack_init = reinterpret_cast<uint32_t>(main_vectors[0]);
//mEntryPoint, a uint32_t has been read earlier from the firmware image header,
//checked to make sense (lies within the
//firmware image region etc)
const auto entry_point = reinterpret_cast<jump_ptr_t>(mEntryPoint);
//Sanity check (left out in release)
assert(is_valid() && entry_point != nullptr && stack_init != 0);
//If I uncomment the following line, this works
/*if (mEntryPoint == 0)
return;*/
//Set the stack pointer for the process we are going to
//NOTE: OUR OWN STACK IS NOW CORRUPT
//so we MUST NOT RETURN!
Hardware::set_MSP(stack_init);
entry_point(param);
//No returning, stack is broken
assert(false);
}
Now, I suppose that using the pointer acquired via reinterpret_cast
is undefined behaviour, and so the optimizer doesn't think it necessary to actually load the value from memory etc. Actually, I just realized that I'm probably also relying on the address having been loaded to a register already, since I destroy the local stack on the line just before the jump...
The question: how do I call my firmware without invoking undefined behaviour in C++? Or does it inevitably require inline assembler?
Update:
Here's the last few lines of disassembly without the check, just before the jump:
msr MSP, r3
ldr r0, [r6, #0]
ldr r3, [sp, #40] ; 0x28
blx r3
and here with the check (i.e. the case that works):
ldr r3, [sp, #40] ; 0x28
ldr r2, [r5, #0]
msr MSP, r2
cmp r3, #0
beq.n 0x800051c <_start()+760>
ldr r0, [r6, #0]
blx r3
So it really appears that even with the check, it works at best by dumb luck, due to the different order of stack loads vs. setting the stack pointer.
User contributions licensed under CC BY-SA 3.0