Recreating JS bitwise integer handling in Python3

Question

Recreating JS bitwise integer handling in Python3

I need to translate a hash function from JavaScript to Python.

The function is as follows:

function getIndex(string) {
        var length = 27;
        string = string.toLowerCase();
        var hash = 0;
        for (var i = 0; i < string.length; i++) {
                hash = string.charCodeAt(i) + (hash << 6) + (hash << 16) - hash;
        }
        var index = Math.abs(hash % length);
        return index;
}

console.log(getIndex(window.prompt("Enter a string to hash")));

This function is Objectively Correct™. It is perfection itself. I can't change it, I just have to recreate it. Whatever it outputs, my Python script must also output.

However - I'm having a couple of problems, and I think that it's all to do with the way that the two languages handle signed integers.

JS bitwise operators treat their operands as a sequence of 32 bits. Python, however, has no concept of bit limitation and just keeps going like an absolute madlad. I think that this is the one relevant difference between the two languages.

I can limit the length of hash in Python by masking it to 32 bits with hash & 0xFFFFFFFF.

I can also negate hash if it's above 0x7FFFFFFF with hash = hash ^ 0xFFFFFFFF (or hash = ~hash - they both seem to do the same thing). I believe that this simulates negative numbers.

I apply both of these restrictions to the hash with a function called t.

Here's my Python code so far:

def nickColor(string):
    length = 27

    def t(x):
        x = x & 0xFFFFFFFF
        if x > 0x7FFFFFFF:
            x = x ^ 0xFFFFFFFF
        return x

    string = string.lower()
    hash = t(0)
    for letter in string:
        hash = t(hash)
        hash = t(t(ord(letter)) + t(hash << 6) + t(hash << 16) - t(hash))
    index = hash % length
    return index

It seems to work up until the point that a hash needs to become negative, at which point the two scripts diverge. This normally happens about 4 letters into the string.

I'm assuming that my problem lies in recreating JS negative numbers in Python. How can I say bye to this problem?

python-3.x

hash

integer

binary-operators

asked on Stack Overflow Mar 8, 2019 by

snazzybouche • edited Mar 9, 2019 by

snazzybouche

1 Answer

Here is a working translation:

def nickColor(string):
    length = 27

    def t(x):
        x &= 0xFFFF_FFFF
        if x > 0x7FFF_FFFF:
            x -= 0x1_0000_0000
        return float(x)

    bytes = string.lower().encode('utf-16-le')
    hash = 0.0
    for i in range(0, len(bytes), 2):
        char_code = bytes[i] + 256*bytes[i+1]
        hash = char_code + t(int(hash) << 6) + t(int(hash) << 16) - hash
    return int(hash % length if hash >= 0 else abs(hash % length - length))

The point is, only the shifts (<<) are calculated as 32-bit integer operations, their result is converted back to double before entering additions and subtractions. I'm not familiar with the rules for double-precision floating point representation in the two languages, but it's safe to assume that on all personal computing devices and web servers it is the same for both languages, namely double-precision IEEE 754. For very long strings (thousands of characters) the hash could lose some bits of precision, which of course affects the final result, but in the same way in JS as in Python (not what the author of the Objectively Correct™ function intended, but that's the way it is…). The last line corrects for the different definition of the % operator for negative operands in JavaScript and Python.

Furthermore (thanks to Mark Ransom for reminding me this), to fully emulate JavaScript, it is also necessary to consider its encoding, which is UTF-16, but with surrogate pairs handled as if they consisted of 2 characters. Encoding the string as utf-16-le you make sure that the first byte in each 16-bit “word” is the least significant one, plus, you don't get the BOM that you would get if you used utf-16 tout court (thank you Martijn Pieters).

answered on Stack Overflow Mar 12, 2019 by

Walter Tross • edited Mar 13, 2019 by

Walter Tross

User contributions licensed under CC BY-SA 3.0