Reason for & and | in endianess conversion

0
/ Swap endian (big to little) or (little to big)
uint32_t num = 9;
uint32_t b0,b1,b2,b3;
uint32_t res;

b0 = (num & 0x000000ff) << 24u;
b1 = (num & 0x0000ff00) << 8u;
b2 = (num & 0x00ff0000) >> 8u;
b3 = (num & 0xff000000) >> 24u;
res = b0 | b1 | b2 | b3;

I got this code from an answer posted at Convert Little Endian to Big Endian

I understand the above steps swap the byte to convert from little to big endian. Why "&" with (0x0000FF00,....) for b0,b1,.. at each step and "|" in the end for the result? Can some one explain these doubts that will help me understand the conversion between endianess.

c
asked on Stack Overflow May 23, 2015 by EnthusiatForProgramming • edited May 23, 2017 by Community

2 Answers

6

It's so you can mask bits off and set bits. At least that's the simple answer. In C, & is the bitwise AND operator and | is the bitwise OR operator (which are a little different than the logical && and || used for boolean operations). Take a look at the truth tables below.

AND       OR
A B X     A B X
0 0 0     0 0 0
0 1 0     0 1 1
1 0 0     1 0 1
1 1 1     1 1 1

A and B are inputs and X is the output. So when you do a 16-bit endian swap, you would use a macro like this:

#define endianswap16(x)  (x << 8) | (x >> 8)

This takes x, does a shift, then ORs the results together to get the endian swap. Take the 32-bit endian swap which uses both & and | in addition to bit shifting:

#define endianswap32(x)  (x << 24) | (x & 0x0000FF00) << 8) \
  | (x & 0x00FF0000) >> 8) | (x >> 24)

Since 32 bits is 4 bytes, this swaps the two outer bytes with each other and then swaps the two inner bytes with each other. Then it does logical ORs to put the 32-bit number back together. The ANDs are used to mask off certian bit positions so when we perform the ORs to reconstruct the number, we don't change the value.

As to your question as to why we do it this way, it's because we have to reverse the order of the bytes. Take 0x12345678 for instance. When stored in memory on both a little and big endian machines, it would look like this:

---> Increasing Memory Address
78 56 34 12   Little Endian
12 34 56 78   Big Endian

Intel and their clones are little endian machines which actually has advantages over the big endian format. Big endian machines are the IBN S/360 and descendants (Z-architecture), Sparc, Motorola 68000 series and PowerPC, MIPS, and others.

There are two big problems when dealing with platforms that differ in endiness:

  • The first one is when exchanging binary data between big and little endian platforms.
  • The second one is when software takes a multibyte value and splits it up into different byte values.

An example of this is Intel machines communicating over the network. The network addresses are in network byte order which is big endian, but Intel machines are little endian. So IP addresses and such need to have their endians swapped for them to be interpreted correctly.

answered on Stack Overflow May 23, 2015 by Daniel Rudy • edited May 23, 2015 by Daniel Rudy
1

It's fairly straightforward bit masking and shifting.

the (num & 0x000000ff) zeros out all but a single byte of the word. The << 24u shifts it by 24 bits, or 3 8-bit bytes, putting it at the other end of the word. The next three lines swap the remaining 3 bytes in a similar manner. Then the b1|b2|... combines those bytes together to make the final word.

See How do you set, clear, and toggle a single bit? and What are bitwise shift (bit-shift) operators and how do they work?. The same operations that work on single bits work on groups of bits, in this case the 8 bits that make a byte.

answered on Stack Overflow May 23, 2015 by AShelly • edited May 23, 2017 by Community

User contributions licensed under CC BY-SA 3.0