ESP8266 32-bit aligned memcpy

1

The ESP8266 is running an xtensa core and to read data from flash storage all accesses must be performed with 32bit words. To perform this I wrote the following method:

void memcpy_P(void * dst, const void * src, const unsigned int len)
{
  char       * _dst = (      char *)dst;
  const char * _src = (const char *)src;

  unsigned int aligned_len = len & ~0x3;
  while(aligned_len > 0)
  {
    *(uint32_t *)_dst = *(uint32_t *)_src;
    _dst        += 4;
    _src        += 4;
    aligned_len -= 4;
  }

  const unsigned int remainder = len & 0x3;
  if (remainder > 0)
  {
    uint32_t tmp = *(uint32_t *)_src;
    _dst[0] = (tmp & 0xFF000000) >> 24;
    if (remainder > 1)
    {
      _dst[1] = (tmp & 0x00FF0000) >> 16;
      if (remainder > 2)
        _dst[2] = (tmp & 0x0000FF00) >>  8;
    }
  }
}

Is there any changes here one could suggest to improve performance?

Note: This is platform specific and will never be used on any other platform/architecture, an assembly version that specifically targets the xtensa core would be perfectly acceptable in this instance.

EDIT

Based on feedback/review & google I have come up with the following:

void memcpy_P(void * dst, const void * src, const unsigned int len)
{
  uint32_t       * _dst = (      uint32_t *)dst;
  const uint32_t * _src = (const uint32_t *)src;
  const uint32_t * _end = _src + (len >> 2);

  while(_src != _end)
    *_dst++ = *_src++;  

  const uint32_t rem = len & 0x3;
  if (!rem)
    return;

  const uint32_t mask = 0xFFFFFFFF << ((4 - rem) << 3);
  *_dst = (*_dst & ~mask) | (*_src & mask);
}
c
esp8266
memcpy
micro-optimization
xtensa
asked on Stack Overflow May 23, 2017 by Geoffrey • edited Aug 14, 2019 by Peter Cordes

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0