Why is the IRQ latency in my ARM interrupt handler always the same, regardless of the instruction that is being interrupted?

0

I am trying to apply a type of side channel attack I read about in this paper that tries to infer execution state from differences in IRQ latencies on a MCU with a cortex M4 processor. The attack carefully interrupts instructions that occur right after a branch and measures the interrupt latency. When different branches have instructions of different lengths, you can look at the interrupt latency to determine in which of these branches the interrupt occurred and leak some of the program state.

I wrote a simple function that I want to attack in the way described above. I am using the SysTick timer to generate the interrupt at the correct point in time. To get an initial good value for the interrupt timer I used GDB to stop the program at the target line to see the SysTick value at that time.

I implemented a very simple interrupt handler that

  1. loads the SysTick timer value from memory
  2. subtracts this value from the reload value to get the elapsed time since interrupt (i.e. the IRQ latency)
  3. clears the interrupt and
void __attribute__((interrupt("IRQ"))) SysTick_Handler(void)
{
  /* USER CODE BEGIN SysTick_IRQn 0 */
    SysTick->CTRL &= 0xfffffffe;                                // disable SysTick (~SysTick_CTRL_ENABLE_Msk)
    *timer_value = SysTick->VAL;                                // capture counter value (as quickly as possible)
    *timer_value = SysTick->LOAD - *timer_value;                    // subtract it from reload value to get IRQ latency
    SysTick->VAL = 0;                                           // reset initial value
}   

However I find that I always get the same IRQ latency, regardless of the instruction that was interrupted. I expect the interrupt latency to be longer when a longer instruction is interrupted.

This is the function I wrote to test the attack

extern uint32_t *timer_value;
int sample_function(int *a, int *b){
    /*
     * function description -- store the smallest of the two value in a, if MEASURE_CYCLESS defined return the number
     * of clock cycles that have been elapsed since the timer has been started
     * r0 contains pointer to a
     * r1 contains pointer to b
     */

    __asm volatile(
        /*  push working registers */
        "PUSH {r4-r8} \n"
        /* move counter into r8 */
        "MOV r8, #10 \n"
        /* begin loop */
        "begin_loop: \n"
        /* decrement counter variable*/
        "SUB r8, r8, #1 \n"
        /* if counter variable not equal to 0, jump back to start of loop */
        "CMP r8, #0 \n"
        /* if r8 not equal to 0, jump back to begin of loop*/
        "BNE begin_loop \n"
        /* load a into r2 */
        "LDR r2, [r0] \n"
        /* load b into r3 */
        "LDR r3, [r1] \n"
        /*  store a-b in r4, setting status flags -- if result is 0 Z flag is set */
        "SUBS r4, r2, r3 \n"
        /* if a-b positive, a is larger  otherwise, b is larger (assuming a not equal to b)  */
        "BPL a_larger \n"
#ifdef SPY
        /* load address of (*timer_value) into r4 -- use of LDR pseudo-instruction places constant in a literal pool*/
        "LDR r4, =timer_value \n"
        /* Load (*timer_value) into r4 */
        "LDR r4, [r4] \n"
        /* load address of Systick VAL into r5 */
        "LDR r5, =0xe000e018 \n"
        /* Load value at address stored in R5 (= Systick Val) */
        "LDR r5, [r5] \n"
        /* Move Systick Val into adress stored at r4 (= *timer_value = address of timer_value)*/
        "STR r5, [r4] \n"
#endif
        "NOP \n"
        /*instruction that gets interrupted -- swap value*/
        "STR r2, [r1] \n"
        /* load value at this address into r0 (return value) */
        "STR r3, [r0] \n"
        "B end \n"
        "a_larger: \n"
        "MOV r0, #0 \n"              // instruction that gets interrupted
        "end: POP    {r4-r8}"
            );     // pop working registers
}

Note, the section of code in the #define block is used to automatically determine a good timer reload value (instead of using GDB), but I'm currently not using the value I obtained this way. I also have an empty loop in there to delay the instruction that is meant to be interrupted a bit.

The instruction that gets interrupted is the instruction right after the #define block. When I remove the NOP instruction I still get the same interrupt latency. When I increase or decrease the timer value (to interrupt some cycles earlier or later) I also still get the same IRQ latency.

Am I missing something here? Is there some behavior I do not know about? Also, is it important to use the attribute __attribute__((interrupt("IRQ")) for an interrupt handler?

c
arm
asked on Stack Overflow Dec 12, 2020 by BigBill

1 Answer

0

This is what I was thinking and commenting on.

bootstrap

.thumb_func
reset:
    bl notmain
    ldr r4,=0xE000E018
    ldr r0,=0xE000E010
    mov r1,#7
    str r1,[r0]
    b hang
.thumb_func
hang:   
    nop
    nop
    nop
    nop
    nop
    nop
    nop
    b hang

setup uart and systick

void notmain ( void )
{
    uart_init();
    hexstring(0x12345678);
    
    PUT32(STK_CSR,4);
    PUT32(STK_RVR,0xF40000);
    PUT32(STK_CVR,0x00000000);
    //PUT32(STK_CSR,7);
}

event handler

.thumb_func
.globl systick_handler
systick_handler:
    ldr r0,[r4]
    ldr r5,[sp,#0x18]
    push {r0,lr}
    bl hexstrings
    mov r0,r5
    bl hexstring
    pop {r0,pc}

grab the timer and address of interrupted instruction and print them out.

00F3FFF4 08000054 
00F3FFF4 08000056 
00F3FFF4 08000058 
00F3FFF4 0800005A 
00F3FFF4 0800005C 
00F3FFF4 0800005E 
00F3FFF4 08000054 
00F3FFF4 08000056 
00F3FFF4 08000058 
00F3FFF4 0800005A 
00F3FFF4 08000050 


08000050 <hang>:
 8000050:   bf00        nop
 8000052:   bf00        nop
 8000054:   bf00        nop
 8000056:   bf00        nop
 8000058:   bf00        nop
 800005a:   bf00        nop
 800005c:   bf00        nop
 800005e:   e7f7        b.n 8000050 <hang>

From ARM's documentation.

Interrupt Latency

There is a maximum of a twelve cycle latency from asserting the interrupt to execution of the first instruction of the ISR when the memory being accessed has no wait states being applied. When the FPU option is implemented and a floating point context is active and the lazy stacking is not enabled, this maximum latency is increased to twenty nine cycles. The first instructions to be executed are fetched in parallel to the stack push.

And that last line we can perhaps see happening here. You can try various instructions, but this architecture has the ability to restart the long duration instructions (reads and push/pop, multiply, and such). I think to see much of a latency difference you may need to create bus or shared resource contention (vs instructions)

Also systick is an exception not an interrupt, so there may be some differences with respect to latency.

answered on Stack Overflow Dec 13, 2020 by old_timer • edited Dec 13, 2020 by old_timer

User contributions licensed under CC BY-SA 3.0