Is this a GCC bug or am I doing something wrong?

0

I am trying to get the final accumulate in the code below to use the ARM M7 SMLAL 32*32->64 bit accumulate function. If I include the T3 = T3 + 1 than it does use this, but if I comment it out it does a full 64*64 bit and accumulate using 3 multiply and 2 add instructions. I don't actually want to add 1 to T3 so it needs to go.

I've broken the code down so that I could analyse it in more detail and it definitely seems to be that the cast of T3 to int32_t and throwing away the bottom 32 bits from the multiply isn't being picked up by the compiler and it thinks T3 still has 64 bits. Bit when I add the simple increment of T3 it then gets it correct. I tried adding zero but then it goes back to the full 64*64 bit multiply.

I'm using the -O2 optimisation on STM's STM32CubeIDE which uses a version of GCC. Other optimations never use SMLAL or unroll everything.

int64_t T4 = 0;
osc = key * NumHarmonics;
harmonic = 0;
do
{
    if (OscLevel[osc] > 1)      
    {
        OscPhase[osc] = OscPhase[osc] + (uint32_t)(T2);
        int32_t T5 = Sine[(OscPhase[osc] >> 16) & 0x0000FFFF];
        int64_t T6 = (int64_t)T1 * Tremelo[harmonic];
        int32_t T3 = (int32_t)(T6 >> 32);   // grab the most significant register
        // T3 = T3 + 1; // needs the +1 to force use of SMLAL in next instruction !  (+0 doesn't help)
        T4 = T4 + (int64_t)T3 * (int64_t)T5; // should be SMLAL but does a full 64*64 mult if no +1 above
    }
    osc++;
    harmonic++;
}
while (harmonic < NumHarmonics);
OscTotal = T4;

without the addition :
 800054e:   4b13        ldr r3, [pc, #76]   ; (800059c <main+0xd8>)
 8000550:   f853 1024   ldr.w   r1, [r3, r4, lsl #2]
 8000554:   ea4f 79e1   mov.w   r9, r1, asr #31
 8000558:   fba7 4501   umull   r4, r5, r7, r1
 800055c:   fb07 f309   mul.w   r3, r7, r9
 8000560:   fb01 3202   mla r2, r1, r2, r3
 8000564:   4415        add r5, r2
 8000566:   e9dd 2300   ldrd    r2, r3, [sp]
 800056a:   1912        adds    r2, r2, r4
 800056c:   416b        adcs    r3, r5
 800056e:   e9cd 2300   strd    r2, r3, [sp]
                    }
                    osc++;
 8000572:   3001        adds    r0, #1
                    harmonic++;


with the addition

 8000542:   4b0b        ldr r3, [pc, #44]   ; (8000570 <main+0xac>)
 8000544:   f853 3020   ldr.w   r3, [r3, r0, lsl #2]
 8000548:   fbc3 6701   smlal   r6, r7, r3, r1
                    }
                    osc++;
 800054c:   3201        adds    r2, #1
                    harmonic++;
gcc
arm
asked on Stack Overflow May 4, 2019 by Mike Bryant • edited May 4, 2019 by Mike Bryant

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0