I have been working on Python based TCP/IP stack project for a while. At the moment i am trying to optimize certain parts of it. One of most frequently run computation intensive code is the function that calculates Internet checksum for packets. I think i been able to (over multiple versions) finally get to the point that i have no idea what else could i improve in it. So far i was able to eliminate almost all the need of copying and slicing packet when data is sent to function or when padding is needed. Also using 64 bit words to compute it. Special case for IPv4 checksum (20 byte length) since its most commonly computed one.
Is there anything else i possibly missed or could do better to improve performance of this function ?
Just please don't tell me i shouldn't use Python for it. The whole point of this project is to use Python and I already written similar function in Assembly for another project so thats covered already ;)
def inet_cksum_fast(data, dptr, dlen, init=0):
""" Compute Internet Checksum used by IPv4/ICMPv4/ICMPv6/UDP/TCP protocols """
if dlen == 20:
cksum = init + sum(struct.unpack_from("!5L", data, dptr))
else:
cksum = init + sum(struct.unpack_from(f"!{dlen >> 3}Q", data, dptr))
if remainder := dlen & 7:
cksum += struct.unpack("!Q", data[dptr + dlen - remainder : dptr + dlen] + b"\0" * (8 - remainder))[0]
cksum = (cksum >> 64) + (cksum & 0xFFFFFFFFFFFFFFFF)
cksum = (cksum >> 32) + (cksum & 0xFFFFFFFF)
cksum = (cksum >> 16) + (cksum & 0xFFFF)
return ~(cksum + (cksum >> 16)) & 0xFFFF
User contributions licensed under CC BY-SA 3.0