Convert 4 bytes to an unsigned 32-bit integer and storing it in a long

4

I'm trying to read a binary file in Java. I need methods to read unsigned 8-bit values, unsigned 16-bit value and unsigned 32-bit values. What would be the best (fastest, nicest looking code) to do this? I've done this in c++ and did something like this:

uint8_t *buffer;
uint32_t value = buffer[0] | buffer[1] << 8 | buffer[2] << 16 | buffer[3] << 24;

But in Java this causes a problem if for example buffer[1] contains a value which has it sign bit set as the result of a left-shift is an int (?). Instead of OR:ing in only 0xA5 at the specific place it OR:s in 0xFFFFA500 or something like that, which "damages" the two top bytes.

I have a code right now which looks like this:

public long getUInt32() throws EOFException, IOException {
    byte[] bytes = getBytes(4);
    long value = bytes[0] | (bytes[1] << 8) | (bytes[2] << 16) | (bytes[3] << 24);
    return value & 0x00000000FFFFFFFFL;
}

If I want to convert the four bytes 0x67 0xA5 0x72 0x50 the result is 0xFFFFA567 instead of 0x5072A567.

Edit: This works great:

public long getUInt32() throws EOFException, IOException {
    byte[] bytes = getBytes(4);
    long value = bytes[0] & 0xFF;
    value |= (bytes[1] << 8) & 0xFFFF;
    value |= (bytes[2] << 16) & 0xFFFFFF;
    value |= (bytes[3] << 24) & 0xFFFFFFFF;
    return value;
}

But isn't there a better way to do this? 10 bit-operations seems a "bit" much for a simple thing like this.. (See what I did there?) =)

java
bit-manipulation
asked on Stack Overflow Nov 2, 2012 by simon • edited Nov 2, 2012 by simon

2 Answers

4

A more regular version converts the bytes to their unsigned values as integers first:

public long getUInt32() throws EOFException, IOException {
    byte[] bytes = getBytes(4);
    long value = 
        ((bytes[0] & 0xFF) <<  0) |
        ((bytes[1] & 0xFF) <<  8) |
        ((bytes[2] & 0xFF) << 16) |
        ((long) (bytes[3] & 0xFF) << 24);
    return value;
}

Don't get hung up on the number of bit operations, most likely the compiler will optimize those to byte operations.

Also, you shouldn't be using long for 32-bit values just to avoid the sign, you can use int and ignore the fact that it is signed most of the time. See this answer.

Update: The cast to long for the most significant byte is needed, because its most significant bit would otherwise be shifted into the sign bit of a 32-bit integer, potentially making it negative.

answered on Stack Overflow Mar 3, 2013 by starblue • edited May 8, 2021 by starblue
2

You've got the right idea, I don't think there's any obvious improvement. If you look at the java.io.DataInput.readInt spec, they have code for the same thing. They switch the order of << and &, but otherwise standard.

There is no way to read an int in one go from a byte array, unless you use a memory-mapped region, which is way overkill for this.

Of course, you could use a DataInputStream directly instead of reading into a byte[] first:

DataInputStream d = new DataInputStream(new FileInputStream("myfile"));
d.readInt();

DataInputStream works on the opposite endianness than you are using, so you'll need some Integer.reverseBytes calls also. It won't be any faster, but it's cleaner.

answered on Stack Overflow Nov 2, 2012 by Keith Randall • edited Nov 2, 2012 by Keith Randall

User contributions licensed under CC BY-SA 3.0