I'm using a STM32F429 with ARM Cortex-M4 processor. I premise that I don't know the assembly of ARM, but I need to optimize the code. I read the solution of
How to measure program execution time in ARM Cortex-A8 processor?
that is that I need, but that solution is for Cortex-A8. For a whim, I tried to implement the code of link above on my code but I obtain a SEGV in this point:
if (enable_divider)
value |= 8; // enable "by 64" divider for CCNT.
value |= 16;
// program the performance-counter control-register:
asm volatile ("MCR p15, 0, %0, c9, c12, 0\t\n" :: "r"(value)); /*<---Here I have SEGV error*/
// enable all counters:
asm volatile ("MCR p15, 0, %0, c9, c12, 1\t\n" :: "r"(0x8000000f));
// clear overflows:
asm volatile ("MCR p15, 0, %0, c9, c12, 3\t\n" :: "r"(0x8000000f));
How can I adjust this assembly code to perform on ARM Cortex-M4?
Ditch the Cortex-A8 method.
This is the correct way to do it for most Cortex-M based microcontrollers (do not use SysTick!):
LDR
instruction before you start your measuring.NOP
instruction, then run the code you want to measure.NOP
instruction, then poll the timer value by using a single LDR
instruction when you end your measuring.The NOP
instructions are for accuracy, in order to make sure the pipelining does not disturb your results.
This is necessary on the Cortex-M3, because one LDR
instruction takes two clock cycles. Two contiguous LDR
instructions can be pipelined, so they take only 3 clock cycles total.
See the Cortex-M4 Technical Reference Manual at the ARM Information Center, for more information on the instruction set timing.
Of course, you should run your code from internal SRAM, in order to make sure it's not slowed down by the slow Flash memory.
I cannot guarantee that this will be 100% cycle-accurate on all devices, but it should get very close. (See Chris' comment below). You should also know that this is intended to be used in an environment with no interrupts.
User contributions licensed under CC BY-SA 3.0