I'm implementing variable lenght encoding and reading wikipedia about it. Here is what I found:
0x00000080 0x81 0x00
It mean 0x80
int is encoded as 0x81 0x00
2 bytes. That what I cannot understand. Okay, following the algorithm listed there we have.
0x80
: 00000000 00000000 00000000 10000000
We move the sign bit to the next octet so we have and set to 1 (indicating that we have more octets):
00000000 00000000 00000001 10000000
which is not equals to 0x81 0x00
. I tried to write a program for that:
byte[] ba = new byte[]{(byte) 0x81, (byte) 0x00};
int first = (ba[0] & 0xFF) & 0x7F;
int second = ((ba[1] & 0xFF) & 0x7F) << 7;
int result = first | second;
System.out.println(result); //prints 1, not 0x80
What did I miss?
Another Variable length encoding of integers exists and are widely used. For example ASN.1 from 1984 does define "length" field as:
The encoding of length can take two forms: short or long. The short form is a single byte, between 0 and 127.
The long form is at least two bytes long, and has bit 8 of the first byte set to 1. Bits 7-1 of the first byte indicate how many more bytes are in the length field itself. Then the remaining bytes specify the length itself, as a multi-byte integer.
This encoding is used for example in DLMS COSEM protocol or https certificates. For simple code, you can have a look at ASN.1 java library.
Let's review the algorithm from the Wikipedia page:
We can implement the algorithm like this:
public static byte[] variableLengthInteger(int input) {
// first find out how many bytes we need to represent the integer
int numBytes = ((32 - Integer.numberOfLeadingZeros(input)) + 6) / 7;
// if the integer is 0, we still need 1 byte
numBytes = numBytes > 0 ? numBytes : 1;
byte[] output = new byte[numBytes];
// for each byte of output ...
for(int i = 0; i < numBytes; i++) {
// ... take the least significant 7 bits of input and set the MSB to 1 ...
output[i] = (byte) ((input & 0b1111111) | 0b10000000);
// ... shift the input right by 7 places, discarding the 7 bits we just used
input >>= 7;
}
// finally reset the MSB on the last byte
output[0] &= 0b01111111;
return output;
}
You can see it working for the examples from the Wikipedia page here, you can also plug in your own values and try it online.
User contributions licensed under CC BY-SA 3.0