This is a university question. Just to make sure :-) We need to implement (float)x
I have the following code which must convert integer x to its floating point binary representation stored in an unsigned integer.
unsigned float_i2f(int x) {
if (!x) return x;
/* get sign of x */
int sign = (x>>31) & 0x1;
/* absolute value of x */
int a = sign ? ~x + 1 : x;
/* calculate exponent */
int e = 0;
int t = a;
while(t != 1) {
/* divide by two until t is 0*/
t >>= 1;
e++;
};
/* calculate mantissa */
int m = a << (32 - e);
/* logical right shift */
m = (m >> 9) & ~(((0x1 << 31) >> 9 << 1));
/* add bias for 32bit float */
e += 127;
int res = sign << 31;
res |= (e << 23);
res |= m;
/* lots of printf */
return res;
}
One problem I encounter now is that when my integers are too big then my code fails. I have this control procedure implemented:
float f = (float)x;
unsigned int r;
memcpy(&r, &f, sizeof(unsigned int));
This of course always produces the correct output.
Now when I do some test runs, this are my outputs (GOAL is what It needs to be, result is what I got)
:!make && ./btest -f float_i2f -1 0x80004999
make: Nothing to be done for `all'.
Score Rating Errors Function
x: [-2147464807] 10000000000000000100100110011001
sign: 1
expone: 01001110100000000000000000000000
mantis: 00000000011111111111111101101100
result: 11001110111111111111111101101100
GOAL: 11001110111111111111111101101101
So in this case, a 1 is added as the LSB.
Next case:
:!make && ./btest -f float_i2f -1 0x80000001
make: Nothing to be done for `all'.
Score Rating Errors Function
x: [-2147483647] 10000000000000000000000000000001
sign: 1
expone: 01001110100000000000000000000000
mantis: 00000000011111111111111111111111
result: 11001110111111111111111111111111
GOAL: 11001111000000000000000000000000
Here 1 is added to the exponent while the mantissa is the complement of it.
I tried hours to look ip up on the internet plus in my books etc but I can't find any references to this problem. I guess It has something to do with the fact that the mantissa is only 23 bits. But how do I have to handle it then?
EDIT: THIS PART IS OBSOLETE THANKS TO THE COMMENTS BELOW. int l must be unsigned l.
int x = 2147483647;
float f = (float)x;
int l = f;
printf("l: %d\n", l);
then l becomes -2147483648.
How can this happen? So C is doing the casting wrong?
Hope someone can help me here! Thx Markus
EDIT 2:
My updated code is now this:
unsigned float_i2f(int x) {
if (x == 0) return 0;
/* get sign of x */
int sign = (x>>31) & 0x1;
/* absolute value of x */
int a = sign ? ~x + 1 : x;
/* calculate exponent */
int e = 158;
int t = a;
while (!(t >> 31) & 0x1) {
t <<= 1;
e--;
};
/* calculate mantissa */
int m = (t >> 8) & ~(((0x1 << 31) >> 8 << 1));
m &= 0x7fffff;
int res = sign << 31;
res |= (e << 23);
res |= m;
return res;
}
I also figured out that the code works for all integers in the range -2^24, 2^24. Everything above/below sometimes works but mostly doesn't.
Something is missing, but I really have no idea what. Can anyone help me?
The answer printed is absolutely correct as it's totally dependent on the underlying representation of numbers being cast. However, If we understand the binary representation of the number, you won't get surprised with this result.
To understand an implicit conversion is associated with the assignment operator (ref C99 Standard 6.5.16). The C99 Standard goes on to say:
6.3.1.4 Real floating and integer When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.
Your earlier example illustrates undefined behavior due to assigning a value outside the range of the destination type. Trying to assign a negative value to an unsigned type, not from converting floating point to integer.
The asserts in the following snippet ought to prevent any undefined behavior from occurring.
#include <limits.h>
#include <math.h>
unsigned int convertFloatingPoint(double v) {
double d;
assert(isfinite(v));
d = trunc(v);
assert((d>=0.0) && (d<=(double)UINT_MAX));
return (unsigned int)d;
}
Another way for doing the same thing, Create a union containing a 32-bit integer and a float. The int and float are now just different ways of looking at the same bit of memory;
union {
int myInt;
float myFloat;
} my_union;
my_union.myInt = 0x BFFFF2E5;
printf("float is %f\n", my_union.myFloat);
float is -1.999600
You are telling the compiler to take the number you have (large integer) and make it into a float, not to interpret the number AS float. To do that, you need to tell the compiler to read the number from that address in a different form, so this:
myFloat = *(float *)&myInt ;
That means, if we take it apart, starting from the right:
* -
read from the address of whatever is to the right.So, you are telling the compiler: In the location of (myInt), there is a floating point number, now put that float into myFloat.
User contributions licensed under CC BY-SA 3.0