I want to make custom object hash-able (via pickling). I could find
__hash__ algorithm for Python 2.x (see code below), but it obviously differs from hash for Python 3.2 (I wonder why?). Does anybody know how
__hash__ implemented in Python 3.2?
#Version: Python 3.2 def c_mul(a, b): #C type multiplication return eval(hex((int(a) * b) & 0xFFFFFFFF)[:-1]) class hs: #Python 2.x algorithm for hash from http://effbot.org/zone/python-hash.htm def __hash__(self): if not self: return 0 # empty value = ord(self) << 7 for char in self: value = c_mul(1000003, value) ^ ord(char) value = value ^ len(self) if value == -1: value = -2 return value def main(): s = ["PROBLEM", "PROBLEN", "PROBLEO", "PROBLEP"]#, "PROBLEQ", "PROBLER", "PROBLES"] print("Python 3.2 hash() bild-in") for c in s[:]: print("hash('", c, "')=", hex(hash(c)), end="\n") print("\n") print("Python 2.x type hash: __hash__()") for c in s[:]: print("hs.__hash__('", c, "')=", hex(hs.__hash__(c)), end="\n") if __name__ == "__main__": main()
OUTPUT: Python 3.2 hash() bild-in hash(' PROBLEM ')= 0x7a8e675a hash(' PROBLEN ')= 0x7a8e6759 hash(' PROBLEO ')= 0x7a8e6758 hash(' PROBLEP ')= 0x7a8e6747 Python 2.x type hash: __hash__() hs.__hash__(' PROBLEM ')= 0xa638a41 hs.__hash__(' PROBLEN ')= 0xa638a42 hs.__hash__(' PROBLEO ')= 0xa638a43 hs.__hash__(' PROBLEP ')= 0xa638a5c
The answer why they are different is written there:
Hash values are now values of a new type, Py_hash_t, which is defined to be the same size as a pointer. Previously they were of type long, which on some 64-bit operating systems is still only 32 bits long.
The hashing also consider new values to be calculate, take a look at
For strings, you can take a look at http://svn.python.org/view/python/trunk/Objects/stringobject.c?view=markup line 1263 string_hash(PyStringObject *a)
I looked up the new function in the source (in unicodeobject.c) and rebuilt it in Python. Here it is:
def my_hash(string): x = ord(string) << 7 for c in string: x = (1000003 * x) ^ ord(c) x ^= len(string) needCorrection = x & (1 << 65) x %= 2 ** 64 if needCorrection: x = -~(-x ^ 0xFFFFFFFFFFFFFFFF) if x == -1: x = -2 return x
This is 64-bit only, though. Now with correction for Python's weird behavior when numbers become negative. (You better don't think about this too much.)
User contributions licensed under CC BY-SA 3.0