function to convert float to int (huge integers)

1

This is a university question. Just to make sure :-) We need to implement (float)x

I have the following code which must convert integer x to its floating point binary representation stored in an unsigned integer.

unsigned float_i2f(int x) {
  if (!x) return x;

  /* get sign of x */
  int sign = (x>>31) & 0x1;

  /* absolute value of x */
  int a = sign ? ~x + 1 : x;

  /* calculate exponent */
  int e = 0;
  int t = a;
  while(t != 1) {
    /* divide by two until t is 0*/
    t >>= 1;
    e++;
  };

  /* calculate mantissa */
  int m = a << (32 - e);
  /* logical right shift */
  m = (m >> 9) & ~(((0x1 << 31) >> 9 << 1));

  /* add bias for 32bit float */
  e += 127;

  int res = sign << 31;
  res |= (e << 23);
  res |= m;

  /* lots of printf */

  return res;
}

One problem I encounter now is that when my integers are too big then my code fails. I have this control procedure implemented:

float f = (float)x;
unsigned int r;
memcpy(&r, &f, sizeof(unsigned int));

This of course always produces the correct output.

Now when I do some test runs, this are my outputs (GOAL is what It needs to be, result is what I got)

:!make && ./btest -f float_i2f -1 0x80004999                                                                  
make: Nothing to be done for `all'.
Score   Rating  Errors  Function
x: [-2147464807]        10000000000000000100100110011001
sign: 1
expone: 01001110100000000000000000000000
mantis: 00000000011111111111111101101100
result: 11001110111111111111111101101100
GOAL:   11001110111111111111111101101101

So in this case, a 1 is added as the LSB.

Next case:

:!make && ./btest -f float_i2f -1 0x80000001
make: Nothing to be done for `all'.
Score   Rating  Errors  Function
x: [-2147483647]        10000000000000000000000000000001
sign: 1
expone: 01001110100000000000000000000000
mantis: 00000000011111111111111111111111
result: 11001110111111111111111111111111
GOAL:   11001111000000000000000000000000

Here 1 is added to the exponent while the mantissa is the complement of it.

I tried hours to look ip up on the internet plus in my books etc but I can't find any references to this problem. I guess It has something to do with the fact that the mantissa is only 23 bits. But how do I have to handle it then?

EDIT: THIS PART IS OBSOLETE THANKS TO THE COMMENTS BELOW. int l must be unsigned l.

int x = 2147483647; 
float f = (float)x;

int l = f;
printf("l: %d\n", l);

then l becomes -2147483648.

How can this happen? So C is doing the casting wrong?

Hope someone can help me here! Thx Markus

EDIT 2:

My updated code is now this:

unsigned float_i2f(int x) {
  if (x == 0) return 0;
  /* get sign of x */
  int sign = (x>>31) & 0x1;

  /* absolute value of x */
  int a = sign ? ~x + 1 : x;

  /* calculate exponent */
  int e = 158;
  int t = a;
  while (!(t >> 31) & 0x1) {
    t <<= 1;
    e--;
  };

  /* calculate mantissa */
  int m = (t >> 8) & ~(((0x1 << 31) >> 8 << 1));
  m &= 0x7fffff;

  int res = sign << 31;
  res |= (e << 23);
  res |= m;

  return res;
}

I also figured out that the code works for all integers in the range -2^24, 2^24. Everything above/below sometimes works but mostly doesn't.

Something is missing, but I really have no idea what. Can anyone help me?

c
bit-manipulation
asked on Stack Overflow Sep 6, 2014 by markus_p • edited Sep 6, 2014 by markus_p

1 Answer

2

The answer printed is absolutely correct as it's totally dependent on the underlying representation of numbers being cast. However, If we understand the binary representation of the number, you won't get surprised with this result.

To understand an implicit conversion is associated with the assignment operator (ref C99 Standard 6.5.16). The C99 Standard goes on to say:

6.3.1.4 Real floating and integer When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.

Your earlier example illustrates undefined behavior due to assigning a value outside the range of the destination type. Trying to assign a negative value to an unsigned type, not from converting floating point to integer.

The asserts in the following snippet ought to prevent any undefined behavior from occurring.

#include <limits.h>
#include <math.h>
unsigned int convertFloatingPoint(double v) {
   double d;
   assert(isfinite(v));
   d = trunc(v);
   assert((d>=0.0) && (d<=(double)UINT_MAX));
   return (unsigned int)d;
}

Another way for doing the same thing, Create a union containing a 32-bit integer and a float. The int and float are now just different ways of looking at the same bit of memory;

union {
    int    myInt;
    float myFloat;
} my_union;

my_union.myInt = 0x BFFFF2E5;

printf("float is &#37;f\n", my_union.myFloat);

float is -1.999600

You are telling the compiler to take the number you have (large integer) and make it into a float, not to interpret the number AS float. To do that, you need to tell the compiler to read the number from that address in a different form, so this:

myFloat = *(float *)&myInt ;

That means, if we take it apart, starting from the right:

  • &myInt - the location in memory that holds your integer.
  • (float *) - really, I want the compiler use this as a pointer to float, not whatever the compiler thinks it may be.
  • * - read from the address of whatever is to the right.
  • myFloat = - set this variable to whatever is to the right.

So, you are telling the compiler: In the location of (myInt), there is a floating point number, now put that float into myFloat.

answered on Stack Overflow Sep 6, 2014 by Vineet1982

User contributions licensed under CC BY-SA 3.0