Why does the double conversion of some integers (to float and back to int) not remain equal to its original number, yet some do?

Question

Why does the double conversion of some integers (to float and back to int) not remain equal to its original number, yet some do?

I have two integer variables:

int i1 = 0xdeadbeef and int i2 = 0xffffbeef.

(11011110101011011011111011101111 or 37359285591 and 111111111111111110111110111011111 or 4294950639 respectively).

(int) (float) i1 == i1 evaluates as false, yet (int) (float) i2 == i2 evaluates as true.

Why is this? In this system, both ints and floats are stored in 4 bytes.

c

integer

hex

asked on Stack Overflow Feb 24, 2021 by

Aidan M

3 Answers

This is because float has far less precision than int, it can't store all possible int values without them suffering some damage. Sometimes this damage just rounds your value, sometimes your rounded value matches precisely.

A 32-bit float can only store 24 "significand bits", or numerical data. Other bits are reserved for things like exponent, NaN flagging, Infinity and so on, where that eats into the remaining storage space.

A double does have the required precision as it's usually a 64-bit representation that can store 53 bits of numerical data data.

answered on Stack Overflow Feb 24, 2021 by

tadman

Lots of conversions going on.

int i1 = 0xdeadbeef; int i2 = 0xffffbeef incur implementation defined conversions as the constants are out of int range. Here, they are "wrapped".

i2 is a small value (15 significant bits) exactly representable as a float.

i1 is not. i1 has 30 significant bits, 6 more than the 24 of float. Those lower 6 are not 0, so (float) i1 results is a rounded value.

int main() {
  int i1 = 0xdeadbeef;
  int i2 = 0xffffbeef;
  printf("%d\n", (int) (float) i1 == i1);
  printf("%d\n", (int) (float) i2 == i2);
  printf("%u %10d %17f %10d\n", 0xdeadbeef, i1, (float) i1, (int) (float) i1);
  printf("%u %10d %17f %10d\n", 0xffffbeef, i2, (float) i2, (int) (float) i2);
}

Output

0
1
3735928559 -559038737 -559038720.000000 -559038720
4294950639     -16657     -16657.000000     -16657

answered on Stack Overflow Feb 24, 2021 by

chux - Reinstate Monica • edited Feb 24, 2021 by

chux - Reinstate Monica

C implementations commonly use a 32-bit int, and 0xdeadbeef does not fit in 32 bits (one sign bit and 32 value bits). Initializing i1 with 0xdeadbeef results in a conversion to int. This conversion is implementation-defined. GCC, for example, defines it to wrap modulo 2³², and this is not uncommon.

So int i1 = 0xdeadbeef; initializes i1 to deadbeef₁₆ − 2³² = 3735928559 − 2³² = −559038737 = −21524111₁₆. As you can see from the 8 hexadecimal digits in “−21524111,” this number spans 30 bits from its leading 1 bit to its trailing 1 bit, inclusive (32 bits in 8 digits, but the first two are zeros). The format commonly used for float, IEEE-754 binary32, has only 24 bits in its significand. Any number spanning more than 24 bits in its significant bits does not fit in the format and will be rounded when converted to this float format. So i1 != (int) (float) i1.

In contrast, int i12 = 0xffffbeef; initializes i2 to ffffbeef₁₆ − 2³² = 4294950639 − 2³² = −16657 = −4111₁₆. This spans 15 bits (16 bits in 4 digits, but the first one is a zero). So it fits in the 24 bits of a float significand, and its value does not change when converted to float. So i2 == (int) (float) i2.

answered on Stack Overflow Feb 24, 2021 by

Eric Postpischil

User contributions licensed under CC BY-SA 3.0