What's happening in the background of a unsigned char to integer type cast?

2

I was getting some odd behaviour out of a switch block today, specifically I was reading a byte from a file and comparing it against certain hex values (text file encoding issue, no big deal). The code looked something like:

char BOM[3] = {0};
b_error = ReadFile (iNCfile, BOM, 3, &lpNumberOfBytesRead, NULL); 

switch ( BOM[0] ) {
case 0xef: {
    // Byte Order Marker Potentially Indicates UTF-8
    if ( ( BOM[1] == 0xBB ) && ( BOM[2] == 0xBF ) ) {
        iNCfileEncoding = UTF8;
    }
    break;
           }
}

Which didn't work, although the debug looked ok. I realized that the switch was promoting the values to integers, and once that clicked in place I was able to match using 0xffffffef in the case statement. Of course the correct solution was to make BOM[] unsigned and now everything promotes and compares as expected.

Can someone briefly explain what was going on in the char -> int promotion that produced 0xffffffef instead of 0x000000ef?

c++
binary
casting
implicit-conversion
asked on Stack Overflow Jun 6, 2011 by Stephen • edited Aug 23, 2011 by Stephen

5 Answers

3

The sign of your (signed) char got extended to form a signed int. That is because of the way signed values are stored in binary.

Example

1 in binary char = 00000001

1 in binary int = 00000000 00000000 00000000 00000001

-1 in binary char = 11111111

-1 in binary int is NOT 00000000 00000000 00000000 11111111 but 11111111 11111111 11111111 11111111

if you convert back to decimal you should know up front whether you are dealing with signed or unsigned values because 11111111 might be -1 in signed and 255 in unsigned.

answered on Stack Overflow Jun 6, 2011 by Joris Mans
3

char must be signed on your platform, and what you are seeing is sign extension.

answered on Stack Overflow Jun 6, 2011 by jason
2

What hasn't been stated yet (as I type, anyway) is that it is unspecified whether or not char is singed. In your case - as was stated - char is signed, so any ASCII value above 127 is going to be interpreted as a negative.

answered on Stack Overflow Jun 6, 2011 by John
1
1

"Can someone briefly explain what was going on in the char -> int promotion that produced 0xffffffef instead of 0x000000ef?"

Contrary to the four answers so far, it didn't.

Rather, you had a negative char value, which as a switch condition was promoted to the same negative int value as required by

C++98 §6.4.2/2
Integral promotions are performed.

Then with your 32-bit C++ compiler 0xffffffef was interpreted as an unsigned int literal, because it’s too large for a 32-bit int, by

C++98 2.13.1/2
If it is octal or hexadecimal and has no suffix, it has the first of these types in which it can be represented: int, unsigned int, long int, unsigned long int.

Now, for the case label,

C++98 §6.4.2/2
The integral constant-expression (5.19) is implicitly converted to the promoted type of the switch condition.

In your case, with signed destination type, the result of the conversion is formally implementation-defined, by

C++98 §4.7/3
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined.

But in practice nearly all compilers use two's complement representation with no trapping, and so the implementation defined conversion is in your case that the bitpattern 0xffffffef is interpreted as two's complement specification of a negative value. You can calculate which value by 0xffffffef - 232, because we’re talking 32-bit representation here. Or, since this is just an 8-bit value that’s been sign extended to 32 bits, you can alternatively calculate it as 0xef - 28, where 0xef is the character code point.

Cheers & hth.,

answered on Stack Overflow Jun 6, 2011 by Cheers and hth. - Alf • edited Jun 7, 2011 by Cheers and hth. - Alf

User contributions licensed under CC BY-SA 3.0