Why memcpy/memmove reverse data when copying int to bytes buffer?

5

So, my question is pretty simple:

I need to fill a char/unsigned char array with some information. Some values in the middle are taken from short/int types and this is what happens:

Code:

int foo = 15; //0x0000000F
unsigned char buffer[100]={0};

..
memcpy(&buffer[offset], &foo, sizeof(int)); //either memmove
...

Output:

... 0F 00 00 00 ..

So by now I wrote a function to reverse this fields, but I don't find this a smart solution, as it impacts execution time, resources, and time to develop.

Is there an easier way to do it?

Edit: As many of you have pointed, this behaviour is produced due to the little endian processor, but my problem still remains. I need to fill this buffer with int/short values in big-endian, as i need to serialize tha data to be transmitted to a machine which either works in little/big endian, doesn't matter as this protocol is already defined so.

Note: For compiling in C++

c++
c
asked on Stack Overflow Mar 15, 2017 by Joster • edited Mar 15, 2017 by Joster

4 Answers

6

It's because the processor architecture you use is little endian. Multibyte numbers (anything bigger than a uint8_t) are stored with the least significant byte at the lowest address.

Edit

What you do about it really depends on what the buffer is for. If you are only going to be using the buffer internally, forget about byte swapping, you'll have to do it in both directions and its a waste of time.

If it is for some external entity e.g. a file or a network protocol, the specification of the file or network protocol will say what the endianness is. For example, network byte order for all the Internet protocols is effectively big endian. The networking library provides a family of functions to convert values for use in sending and receiving Internet protocol messages. Se for instance

https://linux.die.net/man/3/htonl

If you want to roll your own, the portable way is to use bit shifts e.g.

void writeUInt32ToBufferBigEndian(uint32_t number, uint8_t* buffer)
{
    buffer[0] = (uint8_t) ((number >> 24) & 0xff);
    buffer[1] = (uint8_t) ((number >> 16) & 0xff);
    buffer[2] = (uint8_t) ((number >> 8) & 0xff);
    buffer[3] = (uint8_t) ((number >> 0) & 0xff);
}
answered on Stack Overflow Mar 15, 2017 by JeremyP • edited Mar 15, 2017 by JeremyP
6

Neither memcpy, nor memmove reverse data when copying objects. The byte values you observe when dumping the character array correspond to the way the 32-bit value 15 (0F in hexadecimal) is stored in memory on your environment.

Its appears to be in little endian order, which is very common in desktop and laptop computers. Other systems, such as many smartphones, might store integer values in big-endian order, 00 00 00 0F, which you consider more natural, but both methods are equally correct. It is just a matter of convention. Little-endian order means the byte with the lowest value bits is stored first, while big-endian is the opposite: the byte with the highest value bits is stored first.

A comprehensive article on Wikipedia covers this subject in depth.

In your application, you must specify in which order the binary value is expected to be stored, and if you decide on big-endian, I suggest you use this code for portability across environments:

#include <stdint.h>

int foo = 15; //0x0000000F
unsigned char buffer[100] = { 0 };

..
buffer[offset + 0] = ((uint32_t)foo >> 24) & 0xFF;
buffer[offset + 1] = ((uint32_t)foo >> 16) & 0xFF;
buffer[offset + 2] = ((uint32_t)foo >>  8) & 0xFF;
buffer[offset + 3] = ((uint32_t)foo >>  0) & 0xFF;
...
answered on Stack Overflow Mar 15, 2017 by chqrlie • edited Mar 15, 2017 by chqrlie
2

On x86 architecture integers in memory are little endian. The lowest byte first. e.g. 0x12345678 will be 78, 56, 34, 12 in memory.

answered on Stack Overflow Mar 15, 2017 by Richard Ramsden
1

The "easier way" is to stop calling it "reversed". Why, really? 0F is the least-significant part of the multi-byte value and you see it stored at the "less-significant" (i.e. lower) address. Looks perfectly consistent and natural to me. Why would you call it "reversed"?

The only thing that looks "reversed" here is that "strange" original notation of yours 0x0000000F in the comments, where you "for some reason" recorded the bytes in right-to-left order: least significant on the right, more significant on the left.

In other words, the reversal here is entirely product of your perception/imagination. You, humans, write numbers in right-to-left order but at the same time output bytes (and write C programs) in left-to-right order. The inconsistency between the two is what is creating the illusion of reversal in such situations.

answered on Stack Overflow Mar 15, 2017 by AnT • edited Mar 15, 2017 by AnT

User contributions licensed under CC BY-SA 3.0