In a Wikipedia article on type punning it gives an example of pointing an int type pointer at a float to extract the signed bit:
However, supposing that floating-point comparisons are expensive, and also supposing that float is represented according to the IEEE floating-point standard, and integers are 32 bits wide, we could engage in type punning to extract the sign bit of the floating-point number using only integer operations:
bool is_negative(float x) { unsigned int *ui = (unsigned int *)&x; return *ui & 0x80000000; }
Is it true that pointing a pointer to a type not its own is undefined behavior? The article makes it seem as if this operation is a legitimate and common thing. What are the things that can possibly go wrong in this particular piece of code? I'm interested in both C and C++, if it makes any difference. Both have the strict aliasing rule, right?
Is it true that pointing a pointer to a type not its own is undefined behavior?
No, both C and C++ allow an object pointer to be converted to a different pointer type, with some caveats.
But with a few narrow exceptions, accessing the pointed-to object via the differently-typed pointer does have undefined behavior. Such undefined behavior arises from evaluating the expression *ui
in the example function.
The article makes it seem as if this operation is a legitimate and common thing. What are the things that can possibly go wrong in this particular piece of code?
The behavior is undefined, so anything and everything within the power of the program to do is possible. In practice, the observed behavior might be exactly what the author(s) of the Wikipedia article expected, and if not, then the most likely misbehaviors are variations on the function computing incorrect results.
I'm interested in both C and C++, if it makes any difference. Both have the strict aliasing rule, right?
To the best of my knowledge, the example code has undefined behavior in both C and C++, for substantially the same reason.
The fact that it is technically undefined behaviour to call this is_negative
function implies that compilers are legally allowed to "exploit" this fact, e.g., in the below code:
if (condition) {
is_negative(bar);
} else {
// do something
}
the compiler may "optimize out" the branch, by evaluating condition
and then unconditionally proceeding to the else
substatement even if the condition is true.
However, because this would break enormous amounts of existing code, "real" compilers are practically forced to treat is_negative
as if it were legitimate. In legal C++, the author's intent is expressed as follows:
unsigned int ui;
memcpy(&ui, &x, sizeof(x));
return ui & 0x80000000;
So the reinterpret_cast
approach to type punning, while undefined according to the standard in this case, is thought of by many people as "de facto implementation-defined" and equivalent to the memcpy
approach.
If this is undefined behavior then why is it given as a seemingly legitimate example?
This was a common practice before C was standardized and added the rules about aliasing, and it has unfortunately persisted in practice. Nonetheless, Wikipedia pages should not be offering it as examples.
Is it true that pointing a pointer to a type not its own is undefined behavior?
The rules are more complicated than that, but, yes, many uses of an object through an lvalue of a different type are not defined by the C or C++ standards, including this one. There are also rules about pointer conversions that may be violated.
The fact that many compilers support this behavior even though the C and C++ standards do not require them to is not a reason to do so, as there is a simple alternative defined by the standards (use memcpy
, below).
In C, an object may be reinterpreted as another type using a union. C++ does not define this:
union { float f; unsigned int ui; } u = { .f = x };
unsigned int ui = u.ui;
or the new value may be obtained more tersely using a compound literal:
(union { float f; unsigned int ui; }) {x} .ui
Naturally, float
and unsigned int
should have the same size when using this.
Both C and C++ support reinterpreting an object by copying the bytes that represent it:
unsigned int ui;
memcpy(&ui, &x, sizeof ui);
Naturally, float
and unsigned int
should have the same size when using this. The above is C code; C++ requires std::memcpy
or a suitable using
declaration.
Accessing data through pointers (or unions) seems pretty common in (embedded) c code but requires often extra knowledge.
When the C Standard characterizes an action as invoking Undefined Behavior, that implies that at least one of the following is true:
One of the reasons for the Standard leaves some actions as Undefined is to, among other things, "identify areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior." A common extension, listed in the Standard as one of the ways implementations may process constructs that invokes "Undefined Behavior", is to process some such constructs by "behaving during translation or program execution in a documented manner characteristic of the environment".
I don't think the code listed in the example claims to be 100% portable. As such, the fact that it invokes Undefined Behavior does not preclude the possibility of it being non-portable but correct. Some compiler writers believe that the Standard was intended to deprecate non-portable constructs, but such a notion is contradicted by both the text of the Standard and the published Rationale. According to the published Rationale, the authors of the Standard wanted to give programmers a "fighting chance" [their term] to write portable code, and defined a category of maximally-portable programs, but not not specify portability as a requirement for anything other than strictly conforming C programs, and they expressly did not wish to demean programs that were conforming but not strictly conforming.
User contributions licensed under CC BY-SA 3.0