Recreating x86 PADDD (packed integer add) in python

0

I'm trying to recreate the x86 packed integer add instruction in Python, specifically for 2 128 bit words in a file i'm reversing.

I'm going off this definition https://mudongliang.github.io/x86/html/file_module_x86_id_226.html

The issue I'm running into is that each index is either over or under by 1. The result when running the actual PADDD in a real ELF (where the two operands are 33:37:38:32:36:00:00:00:34:00:35:39:00:00:00:31 and ef:be:ad:de:ad:de:e1:fe:37:13:37:13:66:74:63:67) is 22:f6:e5:10:e3:de:e1:fe:6b:13:6c:4c:66:74:63:98. Notice f6:e5 in the second and third most significant bytes.

When I run my Python recreation (same operands) I get 22:f5:e6:10:e3:de:e1:fe:6b:13:6c:4c:66:74:63:98. Notice f5:e6 in the second and third most significant bytes.

Here's the python code:

source = bytearray([0xef,0xbe,0xad,0xde,0xad,0xde,0xe1,0xfe,0x37,0x13,0x37,0x13,0x66,0x74,0x63,0x67])
dest = bytearray([0x33,0x37,0x38,0x32,0x36,0x00,0x00,0x00,0x34,0x00,0x35,0x39,0x00,0x00,0x00,0x31])

def PADDD(dst, src, length):
    mask = 0xFFFFFFFF
    word1 = int.from_bytes(dst[:4], 'big') + int.from_bytes(src[:4], 'big')
    word2 = int.from_bytes(dst[4:8], byteorder='big', signed=False) + int.from_bytes(src[4:8], byteorder='big', signed=False)
    word3 = int.from_bytes(dst[8:12], byteorder='big', signed=False) + int.from_bytes(src[8:12], byteorder='big', signed=False)
    word4 = int.from_bytes(dst[12:], byteorder='big', signed=False) + int.from_bytes(src[12:], byteorder='big', signed=False)
    res = word1 & mask
    res <<= 32
    res |= word2 & mask
    res <<= 32
    res |= word3 & mask
    res <<= 32
    res |= word4 & mask
    return bytearray(res.to_bytes(16, 'big'))

PADDD(dest, source, 16)

I can't for the life of me figure out what might be going wrong, especially since it's only two bytes that are wrong.

python
integer
x86-64
sse
simd
asked on Stack Overflow Dec 8, 2020 by not_only_but_also • edited Dec 8, 2020 by not_only_but_also

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0