I am writing code that may run on architectures of different word size (32-bit, 64-bit, etc) and I want to clear the low byte of a value. There is a macro (MAX) that is set to the maximum value of a word. So, for example, on a 32-bit system MAX = 0xFFFFFFFF and on a 64-bit system MAX = 0xFFFFFFFFFFFFFFFF for unsigned values. If I have a word-sized variable that may be signed or unsigned, how can I clear the low byte of the variable with a single expression (no branching)?
My first idea was:
value & ~( MAX - 0xFF )
but this does not appear to work for signed values. My other thought was:
value = value - (value & 0xFF)
which has the disadvantage that it requires a stack operation.
To clear low byte, when not knowing the integer type width can result in incorrect code. So code should be careful.
Consider the below where value
is wider than int/unsigned
. 0xFF
is an int
constant with the value 255. ~0xFF
is then that value with its bit inverted. With common 2's complemented, that would be -256 with its upper bits set as FF...FF00. -256 converted to a wider signed type retains its value and pattern FF...FF00. -256 converted to a wider unsigned type becomes Uxxx_MAX + 1 - 256
, agian with the bit pattern FF...FF00. In both cases, the &
will retain the uppers bits and clear the lower 8.
value_low_8bits_cleared = value & ~0xFF;
An alternative is to do all masking operation with unsigned math to avoid unexpected properties of int
math and int
encodings.
The below has no concerns about sign extension, int
overflow. An optimizing compiler will certainly emit efficient code with a simply and mask. Further, there is no need to code the correct matching max value corresponding to value
.
value_low_8bits_cleared = (value | 0xFFu) ^ 0xFFu;
I am writing code that may run on architectures of different word size (32-bit, 64-bit, etc) and I want to clear the low byte of a value.
There is a macro (MAX) that is set to the maximum value of a word. So, for example, on a 32-bit system MAX = 0xFFFFFFFF and on a 64-bit system MAX = 0xFFFFFFFFFFFFFFFF for unsigned values.
Although C is designed so that implementations can take machine word size into account, the language itself has no inherent sense of machine words. C cares instead about types, and that makes a difference.
Anyway, I take you exactly at your word that you arrange for the replacement text of macro MAX
to be one of the two alternatives you give, depending on the architecture of the machine. Note well that when that replacement text is interpreted as an integer constant, its type may vary between C implementations, and maybe even depending on compiler options.
If I have a word-sized variable that may be signed or unsigned, how can I clear the low byte of the variable with a single expression (no branching)?
The only reason I see for needing a single expression that cannot take the actual type of value
explicitly into account is that you want to use the expression in a macro itself. In that case, you need to take great care around type conversions, especially when you have to account for signed types. This makes your MAX
macro uncomfortable to work with for your purpose.
I'm inclined to suggest a different approach:
(value | 0xFF) ^ 0xFF
The constant 0xFF
will be interpreted as a (signed) int
with a positive value. Provided that value
's type is not smaller than int
, both appearances of 0xFF
will be converted to that type without change in value, whether that type is signed or unsigned. Furthermore, the result of each operation and of the overall expression then has the same type as value
, so no unexpected conversions occur.
How about
value & ~((intptr_t)0xFF)
First you want a mask that has all bits on, but those of the lower order byte
MAX ^ 0xFF
This converts 0xFF
to the same type as MAX
and then does the exclusive or with that value. Because MAX
has all low order bits 1
these then become 0
and the high order bits stay as they are, that is 1
.
Then you have to pull that mask over the value that interests you
value & ( MAX ^ 0xFF )
here is the easy way to clear the low order 8 bits:
value &= ~0xFF;
User contributions licensed under CC BY-SA 3.0