I was reversing some application and i faced this opcode:
PSHUFB XMM2, XMMWORD_ADDRESS
and i tried implementing the algorithm of this function in python with no success! The reference of how this opcode should work is here: http://www.felixcloutier.com/x86/PSHUFB.html
Here is a code snippet:
PSHUFB (with 128 bit operands)
for i = 0 to 15 {
if (SRC[(i * 8)+7] = 1 ) then
DEST[(i*8)+7..(i*8)+0] ← 0;
else
index[3..0] ← SRC[(i*8)+3 .. (i*8)+0]; DEST[(i*8)+7..(i*8)+0] ← DEST[(index*8+7)..(index*8+0)];
endif
}
DEST[VLMAX-1:128] ← 0
im trying to implement the 128 version of this opcode with no success. Here are the values before and after the function
Before
WINDBG>r xmm2
xmm2= 0 3.78351e-044 6.09194e+027 6.09194e+027
After
WINDBG>r xmm2
xmm2=9.68577e-042 0 4.92279e-029 4.92279e-029
in python you can use 'struct' to change those from float numbers to Hex:
hex(struct.unpack('<I', struct.pack('<f', f))[0])
So i can sort of say those are the hex values of XMM2 before and after the PSHUFB opcode:
Before
xmm2 = 0 0x0000001b 0x6d9d7914 0x6d9d7914
After
xmm2 = 00001b00 00000000 10799d78 10799d78
And most importantly, i almost forgot.. the value of XMMWORD_ADDRESS is:
03 02 01 00 07 06 05 04 0D 0C 0B 0A 09 08 80 80
xmmword 808008090A0B0C0D0405060700010203h
Implementation in Python could be highly appreciated. Implementation in C could work as well
or maybe some explanation of how the hell it works! Because i couldnt understand the intel reference
This is the code algorithm i have so far
x = ['00', '00', '00', '00', '00', '00', '00', '1b', '6d', '9d', '79', '14', '6d', '9d', '79', '14']
s = ['03', '02', '01', '00', '07', '06', '05', '04', '0D', '0C', '0B', '0A', '09', '08', '80', '80']
new = []
for i in range(16):
if 0x80 == int(s[i], 16) & 0x80:
print "MSB", s[i]
new.append(0)
else:
print "NOT MSB", s[i]
new.append( x[int(s[i], 16) & 15] )
print x
print new
Where x is the xmm0, and s is the SRC.
the output i get is:
['00', '00', '00', '00', '00', '00', '00', '1b', '6d', '9d', '79', '14', '6d', '9d', '79', '14']
['00', '00', '00', '00', '1b', '00', '00', '00', '9d', '6d', '14', '79', '9d', '6d', '00', '00']
where i should get
['00', '00', '1b', '00', '00', '00', '00', '00', '10', '79', '9d', '78', '10', '79', '9d', '78']
Something else i have noticed right now, in the 'output' i get the hexadecimal number 0x78 Where could it come from?
It works like 16 parallel table lookups, with special handling for indexes that have their top bit set. So for example, it could look like this: (not tested, not Python)
for (int i = 0; i < 16; i++)
new_dest[i] = (src[i] & 0x80) ? 0 : dest[src[i] & 15];
dest = new_dest;
The new_dest
there is significant, because it's really 16 parallel assignments, ie read-before-write, the second lookup is not affected by what happened in to the first byte and so on. Intel's code snippet leaves that implicit (or is wrong, depending on how you look at it).
User contributions licensed under CC BY-SA 3.0