I'm trying to recreate the x86 packed integer add instruction in Python, specifically for 2 128 bit words in a file i'm reversing.

I'm going off this definition

The issue I'm running into is that each index is either over or under by 1. The result when running the actual PADDD in a real ELF (where the two operands are 33:37:38:32:36:00:00:00:34:00:35:39:00:00:00:31 and ef:be:ad:de:ad:de:e1:fe:37:13:37:13:66:74:63:67) is 22:f6:e5:10:e3:de:e1:fe:6b:13:6c:4c:66:74:63:98. Notice f6:e5 in the second and third most significant bytes.

When I run my Python recreation (same operands) I get 22:f5:e6:10:e3:de:e1:fe:6b:13:6c:4c:66:74:63:98. Notice f5:e6 in the second and third most significant bytes.

Here's the python code:

source = bytearray([0xef,0xbe,0xad,0xde,0xad,0xde,0xe1,0xfe,0x37,0x13,0x37,0x13,0x66,0x74,0x63,0x67])
dest = bytearray([0x33,0x37,0x38,0x32,0x36,0x00,0x00,0x00,0x34,0x00,0x35,0x39,0x00,0x00,0x00,0x31])

def PADDD(dst, src, length):
    mask = 0xFFFFFFFF
    word1 = int.from_bytes(dst[:4], 'big') + int.from_bytes(src[:4], 'big')
    word2 = int.from_bytes(dst[4:8], byteorder='big', signed=False) + int.from_bytes(src[4:8], byteorder='big', signed=False)
    word3 = int.from_bytes(dst[8:12], byteorder='big', signed=False) + int.from_bytes(src[8:12], byteorder='big', signed=False)
    word4 = int.from_bytes(dst[12:], byteorder='big', signed=False) + int.from_bytes(src[12:], byteorder='big', signed=False)
    res = word1 & mask
    res <<= 32
    res |= word2 & mask
    res <<= 32
    res |= word3 & mask
    res <<= 32
    res |= word4 & mask
    return bytearray(res.to_bytes(16, 'big'))

PADDD(dest, source, 16)

I can't for the life of me figure out what might be going wrong, especially since it's only two bytes that are wrong.

Dec 8, 2020

