STM32 RAM data disorder

2

Target:STM32F103RCT6 ARM Cortex-M3 core

IDE:keil MDK V5.21

I have a program using the keil + STM32 platform, which has been running well and has been applied to over a million products, but recently I found that there are occasional hardfault cases. After analyzing the stack data and the corresponding assembler code, it was found that the value of a variable was unexpectedly changed to a particularly large value. This is a static local variables within the function, I checked the code and i think the code should not case this problem, I also analyzed the map file, find the address of the variable is at the front of the RAM, and the system stack at the end of the RAM, so it should not because the reason of stack overflow, also because the stack is large enough (8 KB in size), and between the variables and stack separated by about 8 KB address, if the stack overflow, so should be in before they lead to hardfault affects this variable. Then I also looked at whether the variable's adjacent address (especially its previous address) had some pointer manipulation that might cause it to be tampered with, and found no such pitfalls. The only thing I can think of right now is, in order to save code space, I'm going to use optimize level 3. I know I shouldn't doubt the compiler, but there's no other explanation.

Here's the detail:

1、I read the MSP value in hardfault ISR, then read the data stored in the stack to the unused flash address, and I also read some state registers in the core to determine the cause of the error. Hardfault ISR code is as follows:

uint32_t sp = 0;
uint32_t sk[16] = {0};

void HardFault_Handler(void)
{
  /* Go to infinite loop when Hard Fault exception occurs */ 


    sp = __get_MSP();// + 8;
    memcpy(sk, (uint8_t *)sp, 32);
    memcpy(&sk[8], (uint8_t *)0xE000ED28, 32);

    Flash_Prepared(0x0803F000, 0x800);
    Flash_Write(0x0803f000, (uint8_t *)sk, 64);
    while (1)
    {
    }
}

2、When hardfault occurs, I read the data from flash for analysis.The data I read was:

0x200003d0  0x00000000  0x0000F07C 0x20000f80 0x0000012c 0x080219d9 0x0802006c 0x01000000 0x00008200  0x40000000 0x00000000 0x20010000 0x20010000 0x00000000 0x00000030 0x00000200

According to the technical manual of cortex-m3, it can be known that,when the hardfault occurs:

R0 = 0x200003d0    R1 = 0   
R2 = 0x0000F07C   R3 = 0x20000f80 
R12 = 0x0000012c   LR = 0x080219d9 
PC = 0x0802006c 



BFSR = 0x82, that is MMARVALID(bit7) = 1  DACCVIOL(bit1) = 1,
MMAR = 20010000 this means a data access violation occurred when I accessed 0x20010000,which is reasonable because the RAM address of STM32F103RC is no more than 0x20010000.

3、I locate the PC address(0x0802006c) in the assembly window,the code at this address is:

   304:    EXT_UART->DR =   m_EUART_TxFrames[m_EUART_TxFrm_Tail].buf[s_count];               
   305:         s_count++; 
0x0802005A 4B13      LDR      r3,[pc,#76]  ; @0x080200A8
0x0802005C F9902000  LDRSB    r2,[r0,#0x00]
0x08020060 4D0F      LDR      r5,[pc,#60]  ; @0x080200A0
0x08020062 EB032402  ADD      r4,r3,r2,LSL #8
0x08020066 6942      LDR      r2,[r0,#0x14]
0x08020068 1D2D      ADDS     r5,r5,#4
0x0802006A 4414      ADD      r4,r4,r2
0x0802006C 7924      LDRB     r4,[r4,#0x04]
0x0802006E 802C      STRH     r4,[r5,#0x00]
0x08020070 1C52      ADDS     r2,r2,#1

In the map file, you can find the address of the variable,

m_EUART_TxFrames                 0x20000f80   Data        1024  uart.o(.bss)
m_EUART_TxFrm_Tail               0x200003d0   Data           1  uart.o(.data)
s_count                          0x200003e4   Data           4  uart.o(.data)

Analysis the code and stack value:

r0 = 0x200003d0 = &m_EUART_TxFrm_Tail
r3 = 0x20000f80 = m_EUART_TxFrames 

LDRSB    r2,[r0,#0x00] ==> r2 = m_EUART_TxFrm_Tail
ADD      r4,r3,r2,LSL #8 ==> r4 = m_EUART_TxFrames[m_EUART_TxFrm_Tail]
LDR      r2,[r0,#0x14] ==> r2 = s_count
ADD      r4,r4,r2 ==> r4 = m_EUART_TxFrames[m_EUART_TxFrm_Tail].buf[s_count]

Then there are the instructions that cause the error:

LDRB     r4,[r4,#0x04]

In the stack, r2 = 0x0000F07C, This data must be incorrect, because s_count cannot be greater than 250 depending on the code.Since the stack can't hold any more information, I can't tell if only the value of the variable s_count is incorrect or if the data for this block of RAM is tainted.

The corresponding C code of this assembly code is:

#define EUART_RX_BUF_SIZE           250         

#define EUART_TX_FRM_SIZE               4           
#define EUART_TX_FRMBUF_SIZE        250         

typedef struct _EUART_Frame {               
    s32 len;                                                    //len
    u8 buf[EUART_TX_FRMBUF_SIZE];           //data
} EUART_Frame;

EUART_Frame m_EUART_TxFrames[EUART_TX_FRM_SIZE];        
volatile s8 m_EUART_TxFrm_Tail = 0;                 
volatile s8 m_EUART_TxFrm_Head = 0;                 
volatile s8 m_EUART_TxFrm_FreeFrmLen = 0;       

void UART_CheckSend(void)
{
    static s32 s_count = 0;
    u32 temp32 = 0;

    if(m_bEUARTPushingFrms || m_bEUARTCheckingSend)
        return;
    m_bEUARTCheckingSend = 1;

    if ((EXT_UART->SR & USART_FLAG_TXE) == (uint16_t)RESET) 
    {
        m_bEUARTCheckingSend = 0;
        return;
    }

    if(m_EUART_TxFrm_Head == m_EUART_TxFrm_Tail)                
    {
        if((EXT_UART->SR & USART_FLAG_TC) != (uint16_t)RESET)       
        {
            if(m_bEUARTTxEn)
            {
                m_bEUARTTxEn = 0;
                temp32 = GPIOC->CRH;        
                temp32 &= ~(0x00000000F<<8);        
                temp32 |= (0x000000004<<8);
                GPIOC->CRH = temp32;// */
            }
        }
        m_bEUARTCheckingSend = 0;
        return;
    }
    if(!m_bEUARTTxEn)
    {
        m_bEUARTTxEn = 1;
        temp32 = GPIOC->CRH;            
        temp32 &= ~(0x00000000F<<8);
        temp32 |= (0x000000009<<8);
        GPIOC->CRH = temp32;
    }

    EXT_UART->DR = m_EUART_TxFrames[m_EUART_TxFrm_Tail].buf[s_count];       
    s_count++;
    if(s_count >= m_EUART_TxFrames[m_EUART_TxFrm_Tail].len) 
    {
        s_count = 0;
        m_EUART_TxFrm_Tail++;
        if(m_EUART_TxFrm_Tail == EUART_TX_FRM_SIZE)
            m_EUART_TxFrm_Tail = 0;
        m_EUART_TxFrm_FreeFrmLen++;
    }
    m_bEUARTCheckingSend = 0;
 }

I'm pretty sure that m_EUART_TxFrames[m_EUART_TxFrm_Tail].len is not greater than 250, and even assuming that m_EUART_TxFrames[m_EUART_TxFrm_Tail].len is really a large value, I don't think I'll wait until s_count becomes 0x0000F07C to cause hardfault.

Assuming there is no problem with the code, the only possibility I can think of is:

1、stack overflow. But analyze the map file to find out:

 STACK              0x200020a0   Section     8192  startup_stm32f10x_hd.o(STACK)
 __initial_sp       0x200040a0   Data           0  startup_stm32f10x_hd.o(STACK)

The stack is 8K in size, and the bottom of the stack has a large range of addresses away from s_count, with lots of other variables in between.If the stack overflows, I think the program will crash before s_count is affected.

2、The value of s_count is changed when pointer manipulation is performed on a variable in memory that is below the s_count address.I also analyzed the map file and found no code to cause this.

3、ARMCC optimizes the code to cause this problem.But I can't prove it, and “the compiler is always right”.

I can do nothing about it now, and I would be particularly grateful if anyone could offer any ideas!

compiler-optimization
keil
fault
stm32f1
asked on Stack Overflow Aug 14, 2018 by Ian • edited Aug 14, 2018 by Ian

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0