Someone I know was recently asked this question in a coding interview. They were asked to write a function reverseBytes
below such that if you pass it 0xabcdef12
it'd output 0x12efcdab
.
They gave the following solution.
unsigned int reverseBytes (unsigned int x) {
unsigned int ans = 0;
ans = ((x & 0xff000000) >> 24) | ((x & 0x00ff0000) >> 8) | ((x & 0x000000ff) << 24) | ((x & 0x0000ff00) << 8);
return res;
}
Are there any further ways to optimize it? One can of course not do the 1st and 3rd mask, but they'd still need to do at least 2 bitmask operations, 4 bitwise-shift, and 3 bitwise-or, is that correct?
According to them, interviewer was expecting further optimizations. I'm at loss to see that there could be further optimization possible. If so, what are they?
First of all, this function will not work and even compile. If I was interviewer you would have no chance to pass the test. There is a problem with operation priorities and one shift is in the wrong direction.
int
is not the same as int32_t
uint32_t reverseBytes (int32_t y) {
int32_t ans = 0;
uint32_t x = y;
ans = ((x & 0xff000000) >> 24) | ((x & 0x00ff0000) >> 8) | ((x & 0x000000ff) << 24) | ((x & 0x0000ff00) << 8);
return ans;
}
And any optimizing compiler will convert it a single operation if it is available on the target system
you can use compiler-specific builtins like uint32_t __builtin_bswap32 (uint32_t x)
which is most efficient for the particular architecture and is the same efficient despite the optimization level.
If you have a circular shift, start with a shift of 16.
abcdef12 -->
ef12abcd
If not, then
m = ((m << 16) & 0xffff0000) |
((m >> 16) & 0x0000ffff);
Then
m = ((m << 8) & 0xff00ff00) |
((m >> 8) & 0x00ff00ff);
This technique is barely worth doing for a mere 4 items. In fact it may be slower due to the various steps other than those being counted.
That was 2 stages, where 2 = log2(4).
But for, say, swapping all 64 bits in a 64-bit word, it is quite fast. This will take 6 stages (6 = log2(64)). That is, 3 times the work for 16 times as many things being swapped.
Another possible optimization: Note that, for example, 0xff00ff00 and 0x00ff00ff are inverses of each other. If &!
is readily available in the hardware, and if loading a constant is costly, then this could be faster than the above 2nd stage:
m = ((m << 8) & 0xff00ff00) |
((m >> 8) & ~ 0xff00ff00);
The compiler optimizer may be able to load only the one constant and use it twice.
User contributions licensed under CC BY-SA 3.0