How __hash__ is implemented in Python 3.2?

4

I want to make custom object hash-able (via pickling). I could find __hash__ algorithm for Python 2.x (see code below), but it obviously differs from hash for Python 3.2 (I wonder why?). Does anybody know how __hash__ implemented in Python 3.2?

#Version: Python 3.2

def c_mul(a, b):
    #C type multiplication
    return eval(hex((int(a) * b) & 0xFFFFFFFF)[:-1])

class hs:
    #Python 2.x algorithm for hash from http://effbot.org/zone/python-hash.htm
    def __hash__(self):
        if not self:
            return 0 # empty
        value = ord(self[0]) << 7
        for char in self:
            value = c_mul(1000003, value) ^ ord(char)
        value = value ^ len(self)
        if value == -1:
            value = -2
        return value


def main():
    s = ["PROBLEM", "PROBLEN", "PROBLEO", "PROBLEP"]#, "PROBLEQ", "PROBLER", "PROBLES"]
    print("Python 3.2 hash() bild-in")
    for c in s[:]: print("hash('", c, "')=", hex(hash(c)),  end="\n")
    print("\n")
    print("Python 2.x type hash: __hash__()")
    for c in s[:]: print("hs.__hash__('", c, "')=", hex(hs.__hash__(c)),  end="\n")


if __name__ == "__main__":
    main()

OUTPUT:
Python 3.2 hash() bild-in
hash(' PROBLEM ')= 0x7a8e675a
hash(' PROBLEN ')= 0x7a8e6759
hash(' PROBLEO ')= 0x7a8e6758
hash(' PROBLEP ')= 0x7a8e6747


Python 2.x type hash: __hash__()
hs.__hash__(' PROBLEM ')= 0xa638a41
hs.__hash__(' PROBLEN ')= 0xa638a42
hs.__hash__(' PROBLEO ')= 0xa638a43
hs.__hash__(' PROBLEP ')= 0xa638a5c
python
algorithm
hash
asked on Stack Overflow May 15, 2011 by Nikiton • edited Dec 31, 2019 by pppery

2 Answers

5

The answer why they are different is written there:

Hash values are now values of a new type, Py_hash_t, which is defined to be the same size as a pointer. Previously they were of type long, which on some 64-bit operating systems is still only 32 bits long.

The hashing also consider new values to be calculate, take a look at

 sys.hash_info 

For strings, you can take a look at http://svn.python.org/view/python/trunk/Objects/stringobject.c?view=markup line 1263 string_hash(PyStringObject *a)

answered on Stack Overflow May 15, 2011 by Pih • edited May 15, 2011 by Pih
3

I looked up the new function in the source (in unicodeobject.c) and rebuilt it in Python. Here it is:

def my_hash(string):
    x = ord(string[0]) << 7
    for c in string:
        x = (1000003 * x) ^ ord(c)
    x ^= len(string)
    needCorrection =  x & (1 << 65)
    x %= 2 ** 64
    if needCorrection:
        x = -~(-x ^ 0xFFFFFFFFFFFFFFFF)
    if x == -1:
        x = -2
    return x

This is 64-bit only, though. Now with correction for Python's weird behavior when numbers become negative. (You better don't think about this too much.)

answered on Stack Overflow May 15, 2011 by cemper93 • edited May 15, 2011 by cemper93

User contributions licensed under CC BY-SA 3.0