Python and Bash handle hex (shellcode) differently? Inconsistent?

Question

Python and Bash handle hex (shellcode) differently? Inconsistent?

So I've been working on a simple format string exploit and for the past 3 hours or so I have been bashing my head against the table wondering why my hex values weren't appearing on the stack.

If anyone can enlighten me, I would appreciate it a lot.

1.

Initially I was using python for the scripting when doing these challenges and for this example in particular:

python -c 'print "AAAAA\xcc\xd5\xff\x4f"' > a

And subsequently viewing the stack in GDB:

    format string> 
    0xffffd550: 0xffffd584  0xf7ffdab8  0x41f95300  0x41414141
    0xffffd560: 0x95c38cc3  0x0a4fbfc3  0xf7e2ec00  0xf7f8f820

Now it looks like it is not appearing after the "AAAAA" (used 5 since not aligned).

2.

However, when I use another address that I had been previously working with:

python -c 'print "AAAAA\x5c\x57\x55\x56"' > a

I get:

    format string> 
    0xffffd550: 0xffffd584  0xf7ffdab8  0x41f95300  0x41414141
    0xffffd560: 0x5655575c  0x0000000a  0xf7e2ec69  0xf7f8f820

And it seems perfectly fine?

3.

Also, when I use something like:

echo -en "AAAAA\xcc\xd5\xff\x4f" > b

I am able to properly set the value into the stack as so:

format string> 
0xffffd550: 0xffffd584  0xf7ffdab8  0x41f95300  0x41414141
0xffffd560: 0x4fffd5cc  0x00000000  0xf7e2ec69  0xf7f8f820

Below are the outputs of the files a and b respectively:

AAAAA���O
AAAAAÌÕÿO

python

bash

hex

shellcode

format-string

asked on Stack Overflow Jun 2, 2018 by

J Z • edited Jun 3, 2018 by

ottomeister

1 Answer

The problem with the first example is that your string contains values greater than 0x7F. When Python outputs the string, it decides (based on your system and language settings) that it should write out the characters in UTF-8 format.

UTF-8 expresses characters 0x7F and lower as themselves, so the A and x4f characters are written out unchanged. However, UTF-8 expresses character with values above 0x7F as a sequence of multiple bytes. In this case the characters greater than 0x7F are \xcc, \xd5 and \xff. The UTF-8 encodings for those characters are 0xC3 0x8C, 0xC3 0x95 and 0xC3 BF respectively. Those are the values that show up in your memory dump.

You could get around this by forcing Python to emit the string using an encoding that handles values above 0x7F by passing them as themselves, without transformation. "latin1" is such an encoding, so you could use this command:

python 'print u"AAAAA\xcc\xd5\xff\x4f".encode("latin1")'

but that's ugly.

Also, the Python versions always emit a newline character (0x0A) at the end of the string. It shows up in your memory dump in the word after the values you intended to deliver. You can get around that by writing:

python -c 'import sys; sys.stdout.write(u"AAAAA\xcc\xd5\xff\x4f".encode("latin1"))'

but that's even uglier.

I'd forget trying to use a Python one-liner for this and stick with the echo -ne approach.

answered on Stack Overflow Jun 3, 2018 by

ottomeister

User contributions licensed under CC BY-SA 3.0