What is going on with this int to float conversion, why is it innacurate?

Question

What is going on with this int to float conversion, why is it innacurate?

So I've basically got this code

#include <stdio.h>

int main()
{
    int n = 0x7fffffff;
    float f = n;

    printf("%d\n", n);
    printf("%f\n", f);

    n = 0x00ffffff;
    f = n;

    printf("%d\n", n);
    printf("%f", f);
}

This gives this output:

>     2147483647                                                                             
>     2147483648.000000                                                                      
>     16777215                                                                               
>     16777215.000000

Why the difference between the two first numbers, but not the second two numbers. I thought any integer can represented by any float in c. Why does this happen?

c

asked on Stack Overflow Sep 11, 2018 by

PEREZje

1 Answer

Unfortunately, you thought wrong.

On a typical implementation with 32-bit ints and 32-bit floats, it is obvious that a float cannot contain all ints exactly, as some of its bits must be used for the exponent, to make it floating point.

If your platform is IEEE-754 compatible, and your float is single-precision, specifically it breaks up like this:

1 bit - sign
8 bits - exponent
24 bits¹ - significand

This means that all integers up to 24 bits can be exactly represented, after that, some precision must necessarily be lost for some numbers.

With the same assumptions, a double will hold all 32-bit integers, as a double has 53 bits of precision.

References:

¹: Only 23 bits are stored, but the top bit is always considered to be 1, for regular numbers. This means if the top bit needs to be zero, the whole thing is shifted left, and the exponent decreased. This gets us an extra bit of precision that doesn't need to be stored.

answered on Stack Overflow Sep 11, 2018 by

Max • edited Sep 11, 2018 by

Max

User contributions licensed under CC BY-SA 3.0