Weird Behavior of large float in printf() and assigned to an int

1

As per my calculation to convert the float values into the binary value stored by the computer (Sign, Exponent, Mantissa format), out of 32 bits, 1 bit is reserved for sign, 8 bits for Exponent.

So only 23 bits are remaining to represent the number.

So I am thinking that the range of float values with correct behavior would be only 0-0xffffff (basically only 3 bytes value) and not 0-0xffffffff. Is this correct?

And is the weird behavior of the below code related to this concept?

int main(void)
{
    float a=2555555555;
    printf("%f\n",a);
    int b = a;
    printf("%d",b);
    return 0;
}

Output:

a = 2555555584.000000
b = -2147483648
c
floating-point
int
printf
asked on Stack Overflow Aug 11, 2019 by Pavankumar S V • edited Aug 11, 2019 by S.S. Anne

3 Answers

2

The first thing to note about your program is that the value 2555555555 exceeds the bounds of a (signed) 32-bit integer type. From the output of your program it seems that int is a 32-bit type on your system, so 2555555555 cannot be an int.

According to the standard, 2555555555 will then be either a long int or long long int, depending on whether long int is big enough.

The next relevant part is this:

When a value of integer type is converted to a real floating type, if the value being converted can be represented exactly in the new type, it is unchanged. If the value being converted is in the range of values that can be represented but cannot be represented exactly, the result is either the nearest higher or nearest lower representable value, chosen in an implementation-defined manner. If the value being converted is outside the range of values that can be represented, the behavior is undefined.

(Emphasis mine.)

Because 2555555555 requires at least 32 bits of mantissa to be represented exactly, it does not fit into a 32-bit float. That is why a ends up containing 2555555584, the closest representable value of type float.

For int b = a; this section applies:

When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.

(Emphasis mine.)

2555555584, the integer value in question, cannot be represented by a signed 32-bit int. This part of your code has undefined behavior.

answered on Stack Overflow Aug 11, 2019 by melpomene
0

The integer overflowed, as the max value of a 32-bit integer is 2,147,483,647.

Converting to e.g. unsigned int works:

printf("%u", (unsigned int)2555555555.f);  // prints 2555555584
answered on Stack Overflow Aug 11, 2019 by krisz • edited Aug 11, 2019 by krisz
0

So only 23 bits are remaining to represent the number.

This is incorrect. There's one hidden bit so the significand part is 24 bits long

So I am thinking that the range of float values with correct behavior would be only 0-0xffffff (basically only 3 bytes value) and not 0-0xffffffff. Is this correct?

This is also incorrect. Did you forgot the exponent part? float can represent any values that are in the form significand × 2exponent where significand lies in the range [0, 224 - 1]

But how does 2555555584 is the closest representable value? If 2555555555 cannot be represented, the same should apply to 2555555584 right?

2555555584 = 0x9852AF × 28 so it's entirely representable in single precision, as the significand contains exactly 24 bits

That's the behavior up to the first printf. After that you're casting a float value outside int's range to int which invokes undefined behavior. See C What happens when casting floating point types to unsigned integer types when the value would overflow

answered on Stack Overflow Aug 17, 2019 by phuclv

User contributions licensed under CC BY-SA 3.0