How, exactly, do bitwise operators work in Java?

8

I am currently trying to wrap my head around bitwise and bit shift operators in Java. Although they make sense to me in simplified toy examples (basically positive integers), my understanding falls apart as soon as negatives are involved, and in some other cases. I tried searching all over the Internet with two search engines and I even checked the Java specification. I can't find any source that properly describes how bitwise and bit shift operators work in Java.

One function in the Java standard library that is especially confusing to me is java.lang.Integer.toUnsignedLong(int). The source from OpenJdk is shown here (LGPLv2 with classpath exception), with an excerpt in the Javadoc:

/**
 * Converts the argument to a {@code long} by an unsigned
 * conversion.  In an unsigned conversion to a {@code long}, the
 * high-order 32 bits of the {@code long} are zero and the
 * low-order 32 bits are equal to the bits of the integer
 * argument.   
 */
public static long toUnsignedLong(int x) {
    return ((long) x) & 0xffffffffL;
}

According to the official documentation reproduced above, "the high-order 32 bits of the long are zero and the low-order 32 bits are equal to the bits of the integer argument." I do not see how this follows from the code inside the method body however.

When reading the method, the following is my train of thought for positive x:

  1. When the integer is cast to long, its sign bit / most significant bit is zero. As such, the long's sign bit / most significant bit is zero and the low-order bits are equal to those of the integer.
  2. As the long 0xffffffff has all ones in the lowest-order 4 bytes, and because only these bytes will have data in them, this mask has no effect and the correct result is returned.

When reading it in the context of a negative x however, my understanding falls apart:

  1. When the integer is cst to long, its sign bit / most significant bit is one. As such, the long's sign bit / most significant it is one and the low order bits are equal to those of the integer, except the most significant bit of the fourth least significant byte is zero when it is one in the integer.
  2. As the long 0xffffffff has all ones in the lowest-order 4 bytes and zero in the highest-order four bytes, it has the only effect of changing the sign bit on the long, and keeps the incorrect integer in the four least significant bits intact. As such, it returns a wrong answer from this method, where the sign bit of the integer is changed as it moves into the long.

When I test this method, however, I get results consistent with the Javadoc. I suspect that I am misunderstanding one or more fundamental points about bitwise operators in Java or its two's complement integer representation, and I hope that this question can clarify those points.

java
bit-manipulation
language-lawyer
bitwise-operators
asked on Stack Overflow Apr 10, 2019 by john01dav • edited Apr 10, 2019 by john01dav

1 Answer

5

The bitwise operators work exactly as you would expect. They are strict bit-operators and do not consider semantics of bits at all.

Sometimes it is easiest to run through code using breakpoints. For your specific example, I converted the steps of the operation into atomic statements and printed the results with Long.toString.

int x = -57;

// step 1:
long xCast = (long) x;
System.out.println(Long.toString(xCast, 2)); // -1110011 - this is not the bitwise representation however.

long mask = 0xffffffffL;
System.out.println(Long.toString(mask, 2)); // 11111111111111111111111111111111

// step 2:
long result = ((long) x) & mask;
System.out.println(Long.toString(result, 2)); // 11111111111111111111111111000111

Step 1 is the primary reason the operation looks as it does. In Java, all (strictily numeric) values are signed (chars are unsigned). This means that, as you correctly stated, all highest bits are sign-bits. However the interesting part is what the rest of the bits do, if a number is negative. The following thread already covered the basics of the 'Two's complement': What is “2's Complement”? So does this wikipedia-page: https://en.wikipedia.org/wiki/Two%27s_complement

To make it short, in java, for integers:

int zero = 0; // == 0b00000000_00000000_00000000_00000000

int maxPositive = Integer.MAX_VALUE; // == 0b01111111_11111111_11111111_11111111

int minus1 = -1; // == 0b11111111_11111111_11111111_11111111

int minNegative = Integer.MIN_VALUE; // == 0b10000000_00000000_00000000_00000000

So the reason everything works out is because if the integer is negative, when it is cast, the entire upper 32 bits are converted to 1s, because otherwise the represented value of the number would change. effectively:

int x = 0b11111111_11111111_11111111_11000111;

is cast to:

long xCast = 0b11111111_11111111_11111111_11111111_11111111_11111111_11111111_11000111;

Because you as the developer expect the method to return only the initially set bits, you have to mask the upper bits out of the result. This is done in step 2.

So the answer to your example: The representation of non-floating values in Java are two's complement, and therefore, when smart-casting a value from int to long, the upper bits are filled up with 1s for negative numbers. Thus they have to be removed.

answered on Stack Overflow Apr 10, 2019 by TreffnonX • edited Apr 11, 2019 by TreffnonX

User contributions licensed under CC BY-SA 3.0