I'm trying to read a binary file in Java. I need methods to read unsigned 8-bit values, unsigned 16-bit value and unsigned 32-bit values. What would be the best (fastest, nicest looking code) to do this? I've done this in c++ and did something like this:
uint8_t *buffer;
uint32_t value = buffer[0] | buffer[1] << 8 | buffer[2] << 16 | buffer[3] << 24;
But in Java this causes a problem if for example buffer[1] contains a value which has it sign bit set as the result of a left-shift is an int (?). Instead of OR:ing in only 0xA5 at the specific place it OR:s in 0xFFFFA500 or something like that, which "damages" the two top bytes.
I have a code right now which looks like this:
public long getUInt32() throws EOFException, IOException {
byte[] bytes = getBytes(4);
long value = bytes[0] | (bytes[1] << 8) | (bytes[2] << 16) | (bytes[3] << 24);
return value & 0x00000000FFFFFFFFL;
}
If I want to convert the four bytes 0x67 0xA5 0x72 0x50 the result is 0xFFFFA567 instead of 0x5072A567.
Edit: This works great:
public long getUInt32() throws EOFException, IOException {
byte[] bytes = getBytes(4);
long value = bytes[0] & 0xFF;
value |= (bytes[1] << 8) & 0xFFFF;
value |= (bytes[2] << 16) & 0xFFFFFF;
value |= (bytes[3] << 24) & 0xFFFFFFFF;
return value;
}
But isn't there a better way to do this? 10 bit-operations seems a "bit" much for a simple thing like this.. (See what I did there?) =)
A more regular version converts the bytes to their unsigned values as integers first:
public long getUInt32() throws EOFException, IOException {
byte[] bytes = getBytes(4);
long value =
((bytes[0] & 0xFF) << 0) |
((bytes[1] & 0xFF) << 8) |
((bytes[2] & 0xFF) << 16) |
((long) (bytes[3] & 0xFF) << 24);
return value;
}
Don't get hung up on the number of bit operations, most likely the compiler will optimize those to byte operations.
Also, you shouldn't be using long
for 32-bit values just to avoid the sign, you can use int
and ignore the fact that it is signed most of the time. See this answer.
Update: The cast to long for the most significant byte is needed, because its most significant bit would otherwise be shifted into the sign bit of a 32-bit integer, potentially making it negative.
You've got the right idea, I don't think there's any obvious improvement. If you look at the java.io.DataInput.readInt
spec, they have code for the same thing. They switch the order of <<
and &
, but otherwise standard.
There is no way to read an int
in one go from a byte
array, unless you use a memory-mapped region, which is way overkill for this.
Of course, you could use a DataInputStream
directly instead of reading into a byte[]
first:
DataInputStream d = new DataInputStream(new FileInputStream("myfile"));
d.readInt();
DataInputStream
works on the opposite endianness than you are using, so you'll need some Integer.reverseBytes
calls also. It won't be any faster, but it's cleaner.
User contributions licensed under CC BY-SA 3.0