Why is long l = 0x80000000 a positive number?

2

In C++, why is long l = 0x80000000; positive?

C++:
long l = 0x80000000; // l is positive. Why??

int i = 0x80000000;
long l = i; // l is negative

According to this site: https://en.cppreference.com/w/cpp/language/integer_literal, 0x80000000 should be a signed int but it doesn't appear to be case because when it gets assigned to l sign extension doesn't occur.

Java:
long l = 0x80000000; // l is negative

int i = 0x80000000;
long l = i; // l is negative

On the other hand, Java has a more consistent behavior.

C++ Test code:

#include <stdio.h>
#include <string.h>

void print_sign(long l) {
    if (l < 0) {
        printf("Negative\n");
    } else if (l > 0) {
        printf("Positive\n");
    } else {
        printf("Zero\n");
    }    
}

int main() {
    long l = -0x80000000;
    print_sign(l); // Positive

    long l2 = 0x80000000;
    print_sign(l2); // Positive

    int i =   0x80000000;
    long l3 = i;
    print_sign(l3); // Negative

    int i2 =  -0x80000000;
    long l4 = i2;
    print_sign(l4); // Negative
}
c++
asked on Stack Overflow May 22, 2020 by No Ordinary Love • edited May 22, 2020 by No Ordinary Love

2 Answers

2

From your link: "The type of the integer literal is the first type in which the value can fit, from the list of types which depends on which numeric base and which integer-suffix was used." and for hexadecimal values lists int, unsigned int...

Your compiler uses 32 bit ints, so the largest (signed) int is 0x7FFFFFFF. The reason a signed int cannot represent 0x8000000...0xFFFFFFF is that it needs some of the 2^32 possible values of its 32 bits to represent negative numbers. However, 0x80000000 fits in an 32 bit unsigned int. You compiler uses 64 bit longs, which can hold up to 0x7FFF FFFF FFFF FFFF, so 0x80000000 also fits in a signed long, and so the long l is the positive value 0x80000000.

On the other hand int i is a signed int and simply doesn't fit 0x80000000, so undefined behaviour occurs. What often happens when a signed number is too big to fit in C++ is that two-complement arithmetic is used and the number wraps round to a large negative number. (Do not rely on this behaviour; optimisations have been known to break this). In any case it appears the two's complement behaviour has indeed happened in this case, resulting in i being negative.

In your example code you use both 0x80000000 and -0x80000000 and in each case they have the same result. In fact, the are the same. Recall that 0x8000000 is an unsigned int. The 2003 C++ standard says in 5.3.1c7: "The negative of an unsigned quantity is computed by subtracting its value from 2^n, where n is the number of bits in the promoted operand." 0x80000000 is precisely 2^31, and so -0x80000000 is 2^32-2^31=2^31. To get the expected behaviours we would have to use -(long)0x80000000 instead.

answered on Stack Overflow May 22, 2020 by gmatht • edited May 22, 2020 by gmatht
0

With the help of the awesome people on SO, I think I can answer my own question now:

Just to correct the notion that 0x80000000 can't fit in an int: It is possible to store, without loss or undefined behavior, the value 0x80000000 to an int (assuming sizeof(int) == 4). The following code can demonstrate this behavior:

#include <limits.h>
#include <stdio.h>

int main() {
    int i = INT_MIN;
    printf("%X\n", i);
    return 0;
}

Assigning the literal 0x80000000 to a variable is little more nuanced, though.

What the other others failed to mention (except @Daniel Langr) is the fact that C++ doesn't have a concept of negative literals.

There are no negative integer literals. Expressions such as -1 apply the unary minus operator to the value represented by the literal, which may involve implicit type conversions.

With this in mind, the literal 0x80000000 is always treated as a positive number. Negations come after the size and sign have been determined. This is important: negations don't affect the unsigned/signedness of the literal, only the base and the value do. 0x80000000 is too big to fit in a signed integer, so C++ tries to use the next applicable type: unsigned int, which then succeeds. The order of types C++ tries depends on the base of the literal plus any suffixes it may or may not have.

The table is listed here: https://en.cppreference.com/w/cpp/language/integer_literal

So with this rule in mind let's work out some examples:

  1. -2147483648: Treated as a long int because it can't fit in an int.
  2. 2147483648: Treated as a long int because C++ doesn't consider unsigned int as a candidate for decimal literals.
  3. 0x80000000: Treated as an unsigned int because C++ considers unsigned int as a candidate for non-decimal literals.
  4. (-2147483647 - 1): Treated as an int. This is typically how INT_MIN is defined to preserve the type of the literal as an int. This is the type safe way of saying -2147483648 as an int.
  5. -0x80000000: Treated as an unsigned int even though there's a negation. Negating any unsigned is undefined behavior, though.
  6. -0x80000000l: Treated as a long int and the sign is properly negated.
answered on Stack Overflow May 22, 2020 by No Ordinary Love • edited May 22, 2020 by No Ordinary Love

User contributions licensed under CC BY-SA 3.0