Sorry about the clumsy title; I couldn't find a bit way of expressing what I'm trying to do.
I am getting an input from the user of multiple 32-bit integers. For example, the user may enter the following values (showing in hex for ease of explanation):
0x00001234
0x00005678
0x0000abcd
In this particular case, the first 2 bytes of each input is constant, and the last 2 bytes are variable. For efficiency purposes, I could store 0x0000
as a single constant, and create a vector of uint16_t
values to store the variable portion of the input (0x1234
, 0x5678
, 0xabcd
).
Now let's say the user enters the following:
0x00000234
0x56780000
0x00001000
In this case I would need a vector of uint32_t
values to store the variable portion of the input as each value affects different bytes.
My current thought is to do the following:
uint32_t myVal = 0;
myVal |= input1;
myVal |= input2;
// ...
And then at the end find the distance between the first and last "toggled" (i.e. 1
) bit in myVal
. The distance will give me required field size for the variable portion of all of the inputs.
However, this doesn't sound like it would scale well for a large number of user inputs. Any recommendations about an elegant and efficient way of determining this?
Update:
I simplified the problem in my above explanation.
Just to be clear, I am not doing this to save memory (I have better things to do than to try and conserve a few bytes and this isn't for optimization purposes).
In summary, component A provides component B in my system with values. Sometimes these values are 128-bit, but component B only supports 32-bit values.
If the variable portion of the 128-bit value can be expressed with a 32-bit value, I can accept it. Otherwise I will need to reject it with an error.
I'm not in a position to modify component B to allow 128-bit values, or modify component A to prevent its use of 128-bit values (there are hardware limitations here too).
Though I can't see a reason for all that... Why just not to compare an input with the std::numeric_limits<uint16_t>::max()
? If the input gives a larger value then you need to use uint32_t
.
Answering your edit:
I suppose for for better performance you should use hardware specific low level instructions. You could iterate over 32-bit parts of the input 128-bit value and subsequently add each one to the some variable and check the difference between next value and current sum. If the difference isn't equal to the sum then you should skip this 128-bit value, otherwise you'll get the necessary result in the end. The sample follows:
uint32_t get_value( uint32_t v1, uint32_t v2, uint32_t v3, uint32_t v4)
{
uint32_t temp = v1;
if ( temp - v2 != temp ) throw exception;
temp += v2; if ( temp - v3 != temp ) throw exception;
temp += v3; if ( temp - v4 != temp ) throw exception;
temp = v4;
return temp;
}
In this C++ example it may be looks silly but I believe in the assembly code this should efficiently process the input stream.
Store the first full 128 bit number you encounter, then push the lower order 32 bits of it onto a vector, set bool reject_all = false
. For each remaining number, if high-order (128-32=96) bits differ from the first number's then set reject_all = true
, otherwise push their lower-order bits on the vector. At the end of the loop, use reject_all to decide whether to use the vector of values.
The most efficient way to store a series of unsigned integers in the range [0, (2^32)-1]
is by just using uint32_t
. Jumping through hoops to save 2 bytes from user input is not worth your time--the user cannot possibly, in his lifetime, enter enough integers that your code would have to start compressing them. He or she would die of old age long before memory constraints became apparent on any modern system.
It looks like you have to come up with a cumulative bitmask -- which you can then look at to see whether you have trailing or leading constant bits. An algorithm that operates on each input will be required (making it an O(n) algorithm, where n is the number of values to inspect).
The algorithm would be similar to something like what you've already done:
unsigned long long bitmask = 0uL;
std::size_t count = val.size();
for (std::size_t i = 0; i < count; ++i)
bitmask |= val[i];
You can then check to see how many bits/bytes leading/trailing can be made constant, and whether you're going to use the full 32 bits. If you have access to SSE instructions, you can vectorize this using OpenMP.
There's also a possible optimization by short-circuiting to see if the distance between the first 1
bit and the last 1
bit is already greater than 32, in which case you can stop.
For this algorithm to scale better, you're going to have to do it in parallel. Your friend would be vector processing (maybe using CUDA for Nvidia GPUs, or OpenCL if you're on the Mac or on platforms that already support OpenCL, or just OpenMP annotations).
Use
uint32_t ORVal = 0;
uint32_t ANDVal = 0xFFFFFFFF;
ORVal |= input1;
ANDVal &= input1;
ORVal |= input2;
ANDVal &= input2;
ORVal |= input3;
ANDVal &= input3; // etc.
// At end of input...
mask = ORVal ^ ANDVal;
// bit positions set to 0 were constant, bit positions set to 1 changed
A bit position in ORVal
will be 1
if at least one input had 1
in that position and 0
if ALL inputs had 0
in that position. A bit position in ANDVal
will be 0
if at least one input had 0
in that bit position and 1
if ALL inputs had 1
in that position.
If a bit position in inputs was always 1
, then ORVal
and ANDVal
will both be set to 1
.
If a bit position in inputs was always 0
, then ORVal
and ANDVal
will both be set to 0
.
If there was a mix of 0
and 1
in a bit position then ORVal
will be set to 1
and ANDVal
set to 0
, hence the XOR
at the end gives the mask for bit positions that changed.
User contributions licensed under CC BY-SA 3.0