Variable length-encoding of int to 2 bytes

Question

Variable length-encoding of int to 2 bytes

I'm implementing variable lenght encoding and reading wikipedia about it. Here is what I found:

0x00000080  0x81 0x00

It mean 0x80 int is encoded as 0x81 0x00 2 bytes. That what I cannot understand. Okay, following the algorithm listed there we have.

Binary 0x80: 00000000 00000000 00000000 10000000

We move the sign bit to the next octet so we have and set to 1 (indicating that we have more octets): 00000000 00000000 00000001 10000000 which is not equals to 0x81 0x00. I tried to write a program for that:

byte[] ba = new byte[]{(byte) 0x81, (byte) 0x00};
int first = (ba[0] & 0xFF) & 0x7F;
int second = ((ba[1] & 0xFF) & 0x7F) << 7;
int result = first | second;
System.out.println(result); //prints 1, not 0x80

ideone

What did I miss?

java

int

byte

primitive

asked on Stack Overflow Jun 11, 2018 by

Some Name • edited Jun 11, 2018 by

Some Name

2 Answers

Another Variable length encoding of integers exists and are widely used. For example ASN.1 from 1984 does define "length" field as:

The encoding of length can take two forms: short or long. The short form is a single byte, between 0 and 127.

The long form is at least two bytes long, and has bit 8 of the first byte set to 1. Bits 7-1 of the first byte indicate how many more bytes are in the length field itself. Then the remaining bytes specify the length itself, as a multi-byte integer.

This encoding is used for example in DLMS COSEM protocol or https certificates. For simple code, you can have a look at ASN.1 java library.

answered on Stack Overflow Sep 4, 2020 by

Lubo • edited Sep 4, 2020 by

Lubo

Let's review the algorithm from the Wikipedia page:

Take the binary representation of the integer
Split it into groups of 7 bits, the group with the highest value will have less
Take these seven bits as a byte, setting the MSB (most significant bit) to 1 for all but the last; leave it 0 for the last one

We can implement the algorithm like this:

public static byte[] variableLengthInteger(int input) {
    // first find out how many bytes we need to represent the integer
    int numBytes = ((32 - Integer.numberOfLeadingZeros(input)) + 6) / 7;
    // if the integer is 0, we still need 1 byte
    numBytes = numBytes > 0 ? numBytes : 1;
    byte[] output = new byte[numBytes];
    // for each byte of output ...
    for(int i = 0; i < numBytes; i++) {
        // ... take the least significant 7 bits of input and set the MSB to 1 ...
        output[i] = (byte) ((input & 0b1111111) | 0b10000000);
        // ... shift the input right by 7 places, discarding the 7 bits we just used
        input >>= 7;
    }
    // finally reset the MSB on the last byte
    output[0] &= 0b01111111; 
    return output;
}

You can see it working for the examples from the Wikipedia page here, you can also plug in your own values and try it online.

answered on Stack Overflow Jun 11, 2018 by

O.O.Balance

User contributions licensed under CC BY-SA 3.0