Difficulties Understanding Format String Exploitation


I am reading a book, Hacking: The Art of Exploitation 2nd Edition, and I'm at the chapter of format string vulnerability. I read the chapter multiple times but I'm unable to clearly understand it, even with some googling.

So, in the book there is this vulnerable code:

 char text[1024];
 strcpy(text, argv[1]);
 printf("The right way to print user-controlled input:\n");
 printf("%s", text);
 printf("\nThe wrong way to print user-controlled input:\n");

Then after compiling,

reader@hacking:~/booksrc $ ./fmt_vuln $(perl -e 'print "%08x."x40')
The right way to print user-controlled input:
The wrong way to print user-controlled input:

The bytes 0x25, 0x30, 0x38, 0x78, and 0x2e seem to be repeating a lot.

reader@hacking:~/booksrc $ printf "\x25\x30\x38\x78\x2e\n"

First, why is that value repeating itself?

As you can see, they’re the memory for the format string itself. Because the format function will always be on the highest stack frame, as long as the format string has been stored anywhere on the stack, it will be located below the current frame pointer (at a higher memory address).

But it seems to me this contradicts what he previously wrote and the way stack frames are organized

When this printf() function is called (as with any function), the arguments are pushed to the stack in reverse order.

So, shouldn't the format string be at a lower memory address since it is the first argument? And where is the format string stored?

reader@hacking:~/booksrc $ ./fmt_vuln AAAA%08x.%08x.%08x.%08x
The right way to print user-controlled input:
The wrong way to print user-controlled input:

Here again, why is AAAA repeated in 41414141. From what I understand, the printf function prints AAAA first, then when it sees the first %08x, it gets a value from a memory address in the preceding stack frame, then does the same with the second %08x, thus the value of the second is located in a memory address higher than the first one, and finally returns to the value of AAAA located in a lower memory address, in the stack frame of printf function.

I debugged the first example with $(perl -e 'print "%08x."x40') as argument. I run: Linux 5.3.0-40-generic, 18.04.1-Ubuntu, x86_64

(gdb) run $(perl -e 'print "%08x." x 40')
Starting program: /home/kuro/fmt_vuln $(perl -e 'print "%08x." x 40')
The right way to print user-controlled input:
The wrong way to print user-controlled input:

Breakpoint 1, main (argc=2, argv=0x7ffd9d357fc8) at fmt_vuln.c:19
19      printf("[*] test_val @ 0x%08x = %d 0x%08x\n", &test_val, test_val, test_val);
(gdb) x/-100xw $rsp
0x7ffd9d357940: 0x00000400  0x00000000  0x4b07c1aa  0x00007fb8
0x7ffd9d357950: 0x00000016  0x00000000  0x00000003  0x00000000
0x7ffd9d357960: 0x00000001  0x00000000  0x00002190  0x000003e8
0x7ffd9d357970: 0x00000005  0x00000000  0x00008800  0x00000000
0x7ffd9d357980: 0x00000000  0x00000000  0x00000400  0x00000000
0x7ffd9d357990: 0x00000000  0x00000000  0x5e970730  0x00000000
0x7ffd9d3579a0: 0x65336234  0x30663666  0x90890300  0x79e57be9
0x7ffd9d3579b0: 0x1cd79dbf  0x00000000  0x00000000  0x00000000
0x7ffd9d3579c0: 0x05cec660  0x000055ef  0x9d357fc0  0x00007ffd
0x7ffd9d3579d0: 0x00000000  0x00000000  0x00000000  0x00000000
0x7ffd9d3579e0: 0x9d357ee0  0x00007ffd  0x4b062f26  0x00007fb8
0x7ffd9d3579f0: 0x00000030  0x00000030  0x9d357be8  0x00007ffd
0x7ffd9d357a00: 0x9d357a10  0x00007ffd  0x90890300  0x79e57be9
0x7ffd9d357a10: 0x4b3ea760  0x00007fb8  0x07a51260  0x000055ef
0x7ffd9d357a20: 0x4b3eb8c0  0x00007fb8  0x4b0891bd  0x00007fb8
0x7ffd9d357a30: 0x00000000  0x00000000  0x4b3ea760  0x00007fb8
0x7ffd9d357a40: 0x00000d68  0x00000000  0x00000169  0x00000000
0x7ffd9d357a50: 0x07a51260  0x000055ef  0x4b08af51  0x00007fb8
0x7ffd9d357a60: 0x4b3e62a0  0x00007fb8  0x4b3ea760  0x00007fb8
0x7ffd9d357a70: 0x0000000a  0x00000000  0x05cec660  0x000055ef
0x7ffd9d357a80: 0x9d357fc0  0x00007ffd  0x00000000  0x00000000
0x7ffd9d357a90: 0x00000000  0x00000000  0x4b08b403  0x00007fb8
0x7ffd9d357aa0: 0x4b3ea760  0x00007fb8  0x9d357ee0  0x00007ffd
0x7ffd9d357ab0: 0x05cec660  0x000055ef  0x4b0808f5  0x00007fb8
0x7ffd9d357ac0: 0x00000000  0x00000000  0x05cec824  0x000055ef
(gdb) x/100xw $rsp
0x7ffd9d357ad0: 0x9d357fc8  0x00007ffd  0x9d357b10  0x00000002
0x7ffd9d357ae0: 0x78383025  0x3830252e  0x30252e78  0x252e7838
0x7ffd9d357af0: 0x2e783830  0x78383025  0x3830252e  0x30252e78
0x7ffd9d357b00: 0x252e7838  0x2e783830  0x78383025  0x3830252e
0x7ffd9d357b10: 0x30252e78  0x252e7838  0x2e783830  0x78383025
0x7ffd9d357b20: 0x3830252e  0x30252e78  0x252e7838  0x2e783830
0x7ffd9d357b30: 0x78383025  0x3830252e  0x30252e78  0x252e7838
0x7ffd9d357b40: 0x2e783830  0x78383025  0x3830252e  0x30252e78
0x7ffd9d357b50: 0x252e7838  0x2e783830  0x78383025  0x3830252e
0x7ffd9d357b60: 0x30252e78  0x252e7838  0x2e783830  0x78383025
0x7ffd9d357b70: 0x3830252e  0x30252e78  0x252e7838  0x2e783830
0x7ffd9d357b80: 0x78383025  0x3830252e  0x30252e78  0x252e7838
0x7ffd9d357b90: 0x2e783830  0x78383025  0x3830252e  0x30252e78
0x7ffd9d357ba0: 0x252e7838  0x2e783830  0x4b618d00  0x00007fb8
0x7ffd9d357bb0: 0x4b5fd000  0x00007fb8  0x00000000  0x00000000
0x7ffd9d357bc0: 0x9d357c80  0x00007ffd  0x00000000  0x00000000
0x7ffd9d357bd0: 0x00000000  0x00000000  0x00000000  0x00000000
0x7ffd9d357be0: 0x4b3ef6f0  0x00007fb8  0x4b6184c8  0x00007fb8
0x7ffd9d357bf0: 0x9d357c80  0x00007ffd  0x4b3ef000  0x00007fb8
0x7ffd9d357c00: 0x4b3ef914  0x00007fb8  0x4b3ef3c0  0x00007fb8
0x7ffd9d357c10: 0x4b617048  0x00007fb8  0x00000000  0x00000000
0x7ffd9d357c20: 0x00000000  0x00000000  0x4b6179f0  0x00007fb8
0x7ffd9d357c30: 0x4b0030e8  0x00007fb8  0x00000000  0x00000000
0x7ffd9d357c40: 0x4b3efa00  0x00007fb8  0x00000480  0x00000000
0x7ffd9d357c50: 0x00000027  0x00000000  0x00000000  0x00000000

The values, that appear before "%08x." in the Wrong way output, appear in lower addresses than "%08x." values. Why? The format string is supposed to be at the top of the stack.

The values, that appear after the "%08x." values in the Wrong way output, appear in higher addresses than"%08x." values. So in the preceding stack.

Why is it like this? Shouldn't the output begin from the format string values, or after?

Also, in the book, it doesn't print values after "%08x." values. But some are printed in my case. And some values in the output don't even figure in the stack, like 4b16c3a0.

asked on Stack Overflow Apr 14, 2020 by Eye Patch • edited Apr 15, 2020 by Eye Patch

1 Answer


I have to recommend against what you're doing. You're focussing on security vulnerabilities in C without a strong understanding of the language itself. That's an exercise in frustration. As evidence, I offer that every question you're posing about the exercise is answered by understanding printf(3), not stack vulnerabilities.

The output of your perl line (the contents of argv[1]) starts with, %08x.%08x.%08x.%08x.%08x. Thats a format string. Each %08x is looking for a further printf argument, an integer to print in hex representation. Normally, you might do something like,

int a = 'B';
printf( "%02x\n", a );

which produces 42 much faster than the computer in the Hitchhiker's Guide to the Galaxy.

What you've done is pass a long format string with zero arguments. printf(3) can't know how many arguments it was passed; it has to infer them from the format string. Your format string tells printf to print a long list of integers. Since none were provided, it looks for them "up the stack" (wherever they should have been). You print nonsense because the contents of those memory locations is unpredictable. Or, at any rate, weren't defined by you.

In the "good" case, the format string is "%s", declaring one argument of type string, which you provided. That works much better, yes.

Most compilers nowadays take special care with printf. They can produce warnings if the format string isn't a compile-time constant, and they can verify that each argument is of the correct type for its corresponding format specifier. The whole chapter in your book can thus be made moot simply by using the compiler's capabilities and paying attention to its diagnostics.

answered on Stack Overflow Apr 15, 2020 by James K. Lowden

User contributions licensed under CC BY-SA 3.0