Rounding point issues when converting to float bitwise

2

I am working on a homework assignment, where we are supposed to convert an int to float via bitwise operations. The following code works, except it encounters rounding. My function seems to always round down, but in some cases it should round up.

For example 0x80000001 should be represented as 0xcf000000 (exponent 31, mantissa 0), but my function returns 0xceffffff. (exponent 30, mantissa 0xffffff).

I am not sure how to continue to fix these rounding issues. What steps should i take to make this work?

unsigned float_i2f(int x) {
  if(x==0) return 0;
  int sign = 0;
  if(x<0) {
    sign = 1<<31;
    x = -x;
  }
  unsigned y = x;
  unsigned exp = 31;
  while ((y & 0x80000000) == 0)
  {
    exp--;
    y <<= 1;
  }
  unsigned mantissa = y >> 8;

  return sign | ((exp+127) << 23) | (mantissa & 0x7fffff);
}

Possible duplicate of this, but the question is not properly answered.

c
bit-manipulation
rounding
floating
floating-point-conversion
asked on Stack Overflow Sep 8, 2014 by jamiees2 • edited May 23, 2017 by Community

1 Answer

2

You are obviously ignoring the lowest 8 bits of y when you calculate mantissa.

The usual rule is called "round to nearest even": If the lowest 8 bit of y are > 0x80 then increase mantissa by 1. If the lowest 8 bit of y are = 0x80 and bit 8 is 1 then increase mantissa by 1. In either case, if mantissa becomes >= 0x1000000 then shift mantissa to the right and increase exponent.

answered on Stack Overflow Sep 8, 2014 by gnasher729

User contributions licensed under CC BY-SA 3.0