I want to fan out binary data like in the following examples:
8 bit to 16 bit:
0 0 0 1 0 0 0 1 (0x11) becomes
0000 0011 0000 0011 (0x0303)
8 bit to 32 bit:
0 1 0 0 0 1 1 0 (0x46) becomes
0000 1111 0000 0000 0000 1111 1111 0000 (0x0F000FF0)
16 to 32 bit:
0 0 0 1 0 0 1 1 0 1 1 1 1 1 1 1 (0x137F) becomes
0000 0011 0000 1111 0011 1111 1111 1111 (0x030F3FFF)
Since this question is tagged with bit-manipulation
, I am looking for a compact solution without any loops or additional variables needed. The code should also be platform independent, so inline assembly is not what I want either.
This may be a duplicate question, but I have no idea how to search for it. Most certainly there is a proper name for what I want to do, but I just don't know it.
EDIT: To clarify some of your comments, here is a list of methods the solution should not contain:
bit-fiddling
)Since you asked for it, here are some further details on my use case: I'm working on embedded platforms (microcontrollers) and want to mask registers. This specific case is about IO registers, where I have 16 pads for each port and several port registers to configure the pads. Some of these configuration registers use 2 bits per pad, some 4 bits. So if I want to mask individual pads when accessing the registers I need to mask those registers accordingly.
Let's say I have two configuration registers and want to access pads 4 and 8:
uint32_t configreg_2perpad; // the first configuration register
uint32_t configreg_4perpad_low; // lower half of the second configuration register
uint32_t configreg_4perpad_high; // higher half of the second configuration register
uint8_t pad4 = 3; // we start counting at 0, of course
uint8_t pad8 = 7;
uint16_t pad_mask = (1 << pad4) |
(1 << pad8); // this is now 0x0088
So what I am looking for now is a bit-wise manipulation of pad_mask
so I get according masks for the registers, which would be
uint32_t configreg_2perpad_mask = 0x0000C0C0;
uint32_t configreg_4perpad_low_mask = 0xF000F000;
uint32_t configreg_4perpad_high_mask = 0x00000000;
Obviously, since memory (both RAM and flash) is a very scarce resource on µCs, I would prefer a solution without the need for further static variables like look up tables (waste of RAM) or static functions (waste of flash).
Fanning 8 bits to 16 can be done in a few lines of code:
uint16_t n = 0x11;
n = (n | (n << 4)) & 0x0f0f;
n = (n | (n << 2)) & 0x3333;
n = (n | (n << 1)) & 0x5555;
n = (n | (n << 1));
printf("0x%04x\n", n); // prints 0x0303
Here's how it works. Start with the 8 bits in a 16-bit variable:
0000 0000 abcd efgh // letters a to h represent the bits of the 8-bit number
Shift left by 4 bits and OR: n = n | (n << 4)
0000 abcd ???? efgh // bits with a '?' are garbage we don't want
Mask off the garbage: n = n & 0x0f0f
0000 abcd 0000 efgh
Shift left by 2 bits and OR: n = n | (n << 2)
00ab ??cd 00ef ??gh
Mask off the garbage: n = n & 0x3333
00ab 00cd 00ef 00gh
Shift left by 1 bit and OR: n = n | (n << 1)
0a?b 0c?d 0e?f 0g?h
Mask off the garbage: n = n & 0x5555
0a0b 0c0d 0e0f 0g0h
Now all that's left is to duplicate the bits: n = n | (n << 1)
aabb ccdd eeff gghh
This can also be done with a lookup table. The table itself is exactly 16 bytes. Declaring it as static const
allows the compiler to put the table in ROM. The savings in code size (fewer shifts and masking operations) typically makes up for the ROM used by the table. And the lookup method should be faster (2 lookups, 2 shifts, 1 mask, 1 OR) instead of (4 shifts, 3 masks, 4 ORs)
static const uint8_t table[16] = {
0x00, 0x03, 0x0c, 0x0f, 0x30, 0x33, 0x3c, 0x3f,
0xc0, 0xc3, 0xcc, 0xcf, 0xf0, 0xf3, 0xfc, 0xff
};
uint16_t n = 0x11;
n = (table[n >> 4] << 8) | table[n & 0xf];
printf("0x%04x\n", n); // prints 0x0303
Fanning from 8 bits to 32 is about the same as 8-to-16, but it requires more duplication at the end:
uint32_t n = 0x46;
n = (n | (n << 12)) & 0x000f000f;
n = (n | (n << 6)) & 0x03030303;
n = (n | (n << 3)) & 0x11111111;
n = (n | (n << 1));
n = (n | (n << 2));
printf("0x%08x\n", n); // prints 0x0f000ff0
At alternative method of duplicating the bits (suggested by @wim) in the comments is to replace
n = (n | (n << 1));
n = (n | (n << 2));
with
n = (n << 4) - n;
Fanning from 16 bits to 32 requires an extra shift-and-mask step:
uint32_t n = 0x137f;
n = (n | (n << 8)) & 0x00ff00ff;
n = (n | (n << 4)) & 0x0f0f0f0f;
n = (n | (n << 2)) & 0x33333333;
n = (n | (n << 1)) & 0x55555555;
n = (n | (n << 1));
printf("0x%08x\n", n); // prints 0x030f3fff
Assuming 8-bits per byte. 2's complement arch.
This is what I started with:
uint32_t Eight_To_Sixteen(const uint8_t source)
{
uint32_t result = (source & 1);
result |= ((source & 2) << 1);
result |= ((source & 4) << 2);
result |= ((source & 8) << 3);
result |= ((source & 0x10) << 4);
result |= ((source & 0x20) << 5);
result |= ((source & 0x40) << 6);
result |= ((source & 0x80) << 7);
return (result | (result << 1));
}
The above isn't bad. Notice how it folds the result over with itself to duplicate all the bits.
And then I started to compare with a 2-bit table lookup approach:
uint32_t Eight_To_Sixteen(const uint8_t source)
{
static const int table[] = { 0, 3, 0xc, 0xf };
uint32_t result = table[(source & 0x03)];
result |= (table[(source & 0x0c)>>2]) << 4;
result |= (table[(source & 0x30) >> 4]) << 8;
result |= (table[(source & 0xc0) >> 6]) << 12;
return result;
}
Still benchmarking, but the latter is generating far less code. You could expand it to have a larger lookup table and support wider inputs.
User contributions licensed under CC BY-SA 3.0