I am trying to apply a type of side channel attack I read about in this paper that tries to infer execution state from differences in IRQ latencies on a MCU with a cortex M4 processor. The attack carefully interrupts instructions that occur right after a branch and measures the interrupt latency. When different branches have instructions of different lengths, you can look at the interrupt latency to determine in which of these branches the interrupt occurred and leak some of the program state.
I wrote a simple function that I want to attack in the way described above. I am using the SysTick timer to generate the interrupt at the correct point in time. To get an initial good value for the interrupt timer I used GDB to stop the program at the target line to see the SysTick value at that time.
I implemented a very simple interrupt handler that
void __attribute__((interrupt("IRQ"))) SysTick_Handler(void)
{
/* USER CODE BEGIN SysTick_IRQn 0 */
SysTick->CTRL &= 0xfffffffe; // disable SysTick (~SysTick_CTRL_ENABLE_Msk)
*timer_value = SysTick->VAL; // capture counter value (as quickly as possible)
*timer_value = SysTick->LOAD - *timer_value; // subtract it from reload value to get IRQ latency
SysTick->VAL = 0; // reset initial value
}
However I find that I always get the same IRQ latency, regardless of the instruction that was interrupted. I expect the interrupt latency to be longer when a longer instruction is interrupted.
This is the function I wrote to test the attack
extern uint32_t *timer_value;
int sample_function(int *a, int *b){
/*
* function description -- store the smallest of the two value in a, if MEASURE_CYCLESS defined return the number
* of clock cycles that have been elapsed since the timer has been started
* r0 contains pointer to a
* r1 contains pointer to b
*/
__asm volatile(
/* push working registers */
"PUSH {r4-r8} \n"
/* move counter into r8 */
"MOV r8, #10 \n"
/* begin loop */
"begin_loop: \n"
/* decrement counter variable*/
"SUB r8, r8, #1 \n"
/* if counter variable not equal to 0, jump back to start of loop */
"CMP r8, #0 \n"
/* if r8 not equal to 0, jump back to begin of loop*/
"BNE begin_loop \n"
/* load a into r2 */
"LDR r2, [r0] \n"
/* load b into r3 */
"LDR r3, [r1] \n"
/* store a-b in r4, setting status flags -- if result is 0 Z flag is set */
"SUBS r4, r2, r3 \n"
/* if a-b positive, a is larger otherwise, b is larger (assuming a not equal to b) */
"BPL a_larger \n"
#ifdef SPY
/* load address of (*timer_value) into r4 -- use of LDR pseudo-instruction places constant in a literal pool*/
"LDR r4, =timer_value \n"
/* Load (*timer_value) into r4 */
"LDR r4, [r4] \n"
/* load address of Systick VAL into r5 */
"LDR r5, =0xe000e018 \n"
/* Load value at address stored in R5 (= Systick Val) */
"LDR r5, [r5] \n"
/* Move Systick Val into adress stored at r4 (= *timer_value = address of timer_value)*/
"STR r5, [r4] \n"
#endif
"NOP \n"
/*instruction that gets interrupted -- swap value*/
"STR r2, [r1] \n"
/* load value at this address into r0 (return value) */
"STR r3, [r0] \n"
"B end \n"
"a_larger: \n"
"MOV r0, #0 \n" // instruction that gets interrupted
"end: POP {r4-r8}"
); // pop working registers
}
Note, the section of code in the #define
block is used to automatically determine a good timer reload value (instead of using GDB), but I'm currently not using the value I obtained this way.
I also have an empty loop in there to delay the instruction that is meant to be interrupted a bit.
The instruction that gets interrupted is the instruction right after the #define
block. When I remove the NOP
instruction I still get the same interrupt latency. When I increase or decrease the timer value (to interrupt some cycles earlier or later) I also still get the same IRQ latency.
Am I missing something here? Is there some behavior I do not know about?
Also, is it important to use the attribute __attribute__((interrupt("IRQ"))
for an interrupt handler?
This is what I was thinking and commenting on.
bootstrap
.thumb_func
reset:
bl notmain
ldr r4,=0xE000E018
ldr r0,=0xE000E010
mov r1,#7
str r1,[r0]
b hang
.thumb_func
hang:
nop
nop
nop
nop
nop
nop
nop
b hang
setup uart and systick
void notmain ( void )
{
uart_init();
hexstring(0x12345678);
PUT32(STK_CSR,4);
PUT32(STK_RVR,0xF40000);
PUT32(STK_CVR,0x00000000);
//PUT32(STK_CSR,7);
}
event handler
.thumb_func
.globl systick_handler
systick_handler:
ldr r0,[r4]
ldr r5,[sp,#0x18]
push {r0,lr}
bl hexstrings
mov r0,r5
bl hexstring
pop {r0,pc}
grab the timer and address of interrupted instruction and print them out.
00F3FFF4 08000054
00F3FFF4 08000056
00F3FFF4 08000058
00F3FFF4 0800005A
00F3FFF4 0800005C
00F3FFF4 0800005E
00F3FFF4 08000054
00F3FFF4 08000056
00F3FFF4 08000058
00F3FFF4 0800005A
00F3FFF4 08000050
08000050 <hang>:
8000050: bf00 nop
8000052: bf00 nop
8000054: bf00 nop
8000056: bf00 nop
8000058: bf00 nop
800005a: bf00 nop
800005c: bf00 nop
800005e: e7f7 b.n 8000050 <hang>
From ARM's documentation.
Interrupt Latency
There is a maximum of a twelve cycle latency from asserting the interrupt to execution of the first instruction of the ISR when the memory being accessed has no wait states being applied. When the FPU option is implemented and a floating point context is active and the lazy stacking is not enabled, this maximum latency is increased to twenty nine cycles. The first instructions to be executed are fetched in parallel to the stack push.
And that last line we can perhaps see happening here. You can try various instructions, but this architecture has the ability to restart the long duration instructions (reads and push/pop, multiply, and such). I think to see much of a latency difference you may need to create bus or shared resource contention (vs instructions)
Also systick is an exception not an interrupt, so there may be some differences with respect to latency.
User contributions licensed under CC BY-SA 3.0