I'm trying to understand the difference of behavior between a code compiled with the GCC option -mpreferred-stack-boundary=2
and the default value which is -mpreferred-stack-boundary=4
.
I already read a lot of Q/A about this option but I am not able to understand the case I'll described below.
Let's consider this code:
#include <stdio.h>
#include <string.h>
void dumb_function() {}
int main(int argc, char** argv) {
dumb_function();
char buffer[24];
strcpy(buffer, argv[1]);
return 0;
}
On my 64 bits architecture, I want to compile it for 32 bits so I'll use the -m32
option. So, I create two binaries, one with -mpreferred-stack-boundary=2
, one with the default value:
sysctl -w kernel.randomize_va_space=0
gcc -m32 -g3 -fno-stack-protector -z execstack -o default vuln.c
gcc -mpreferred-stack-boundary=2 -m32 -g3 -fno-stack-protector -z execstack -o align_2 vuln.c
Now, if I execute them with an overflow of two bytes, I have segmentation fault for the default alignment but not in the other case:
$ ./default 1234567890123456789012345
Segmentation fault (core dumped)
$ ./align_2 1234567890123456789012345
$
I try to dig why this behavior with default
. Here is the disassembly of the main function:
08048411 <main>:
8048411: 8d 4c 24 04 lea 0x4(%esp),%ecx
8048415: 83 e4 f0 and $0xfffffff0,%esp
8048418: ff 71 fc pushl -0x4(%ecx)
804841b: 55 push %ebp
804841c: 89 e5 mov %esp,%ebp
804841e: 53 push %ebx
804841f: 51 push %ecx
8048420: 83 ec 20 sub $0x20,%esp
8048423: 89 cb mov %ecx,%ebx
8048425: e8 e1 ff ff ff call 804840b <dumb_function>
804842a: 8b 43 04 mov 0x4(%ebx),%eax
804842d: 83 c0 04 add $0x4,%eax
8048430: 8b 00 mov (%eax),%eax
8048432: 83 ec 08 sub $0x8,%esp
8048435: 50 push %eax
8048436: 8d 45 e0 lea -0x20(%ebp),%eax
8048439: 50 push %eax
804843a: e8 a1 fe ff ff call 80482e0 <strcpy@plt>
804843f: 83 c4 10 add $0x10,%esp
8048442: b8 00 00 00 00 mov $0x0,%eax
8048447: 8d 65 f8 lea -0x8(%ebp),%esp
804844a: 59 pop %ecx
804844b: 5b pop %ebx
804844c: 5d pop %ebp
804844d: 8d 61 fc lea -0x4(%ecx),%esp
8048450: c3 ret
8048451: 66 90 xchg %ax,%ax
8048453: 66 90 xchg %ax,%ax
8048455: 66 90 xchg %ax,%ax
8048457: 66 90 xchg %ax,%ax
8048459: 66 90 xchg %ax,%ax
804845b: 66 90 xchg %ax,%ax
804845d: 66 90 xchg %ax,%ax
804845f: 90 nop
Thanks to sub $0x20,%esp
instruction, we can learn the compiler allocates 32 bytes for the stack which is coherent is the -mpreferred-stack-boundary=4
option: 32 is a multiple of 16.
First question: why, if I have a stack of 32 bytes (24 bytes for the buffer and the rest of junk), I get a segmentation fault with an overflow of just one byte?
Let's look what's happening with gdb:
$ gdb default
(gdb) b 10
Breakpoint 1 at 0x804842a: file vuln.c, line 10.
(gdb) b 12
Breakpoint 2 at 0x8048442: file vuln.c, line 12.
(gdb) r 1234567890123456789012345
Starting program: /home/pierre/example/default 1234567890123456789012345
Breakpoint 1, main (argc=2, argv=0xffffce94) at vuln.c:10
10 strcpy(buffer, argv[1]);
(gdb) i f
Stack level 0, frame at 0xffffce00:
eip = 0x804842a in main (vuln.c:10); saved eip = 0xf7e07647
source language c.
Arglist at 0xffffcde8, args: argc=2, argv=0xffffce94
Locals at 0xffffcde8, Previous frame's sp is 0xffffce00
Saved registers:
ebx at 0xffffcde4, ebp at 0xffffcde8, eip at 0xffffcdfc
(gdb) x/6x buffer
0xffffcdc8: 0xf7e1da60 0x080484ab 0x00000002 0xffffce94
0xffffcdd8: 0xffffcea0 0x08048481
(gdb) x/x buffer+36
0xffffcdec: 0xf7e07647
Just before the call to strcpy
, we can see the saved eip is 0xf7e07647
. We can find this information back from the buffer address (32 bytes for the stack stack + 4 bytes for the esp = 36 bytes).
Let's continue:
(gdb) c
Continuing.
Breakpoint 2, main (argc=0, argv=0x0) at vuln.c:12
12 return 0;
(gdb) i f
Stack level 0, frame at 0xffff0035:
eip = 0x8048442 in main (vuln.c:12); saved eip = 0x0
source language c.
Arglist at 0xffffcde8, args: argc=0, argv=0x0
Locals at 0xffffcde8, Previous frame's sp is 0xffff0035
Saved registers:
ebx at 0xffffcde4, ebp at 0xffffcde8, eip at 0xffff0031
(gdb) x/7x buffer
0xffffcdc8: 0x34333231 0x38373635 0x32313039 0x36353433
0xffffcdd8: 0x30393837 0x34333231 0xffff0035
(gdb) x/x buffer+36
0xffffcdec: 0xf7e07647
We can see the overflow with the next bytes after the buffer: 0xffff0035
. Also, where the eip where stored, nothing changed: 0xffffcdec: 0xf7e07647
because the overflow is of two bytes only. However, the saved eip given by info frame
changed: saved eip = 0x0
and the segmentation fault occurs if I continue:
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x00000000 in ?? ()
What's happening? Why my saved eip changed while the overflow is of two bytes only?
Now, let's compare this with the binary compiled with another alignment:
$ objdump -d align_2
...
08048411 <main>:
...
8048414: 83 ec 18 sub $0x18,%esp
...
The stack is exactly 24 bytes. That means an overflow of 2 bytes will override the esp (but still not the eip). Let's check that with gdb:
(gdb) b 10
Breakpoint 1 at 0x804841c: file vuln.c, line 10.
(gdb) b 12
Breakpoint 2 at 0x8048431: file vuln.c, line 12.
(gdb) r 1234567890123456789012345
Starting program: /home/pierre/example/align_2 1234567890123456789012345
Breakpoint 1, main (argc=2, argv=0xffffce94) at vuln.c:10
10 strcpy(buffer, argv[1]);
(gdb) i f
Stack level 0, frame at 0xffffce00:
eip = 0x804841c in main (vuln.c:10); saved eip = 0xf7e07647
source language c.
Arglist at 0xffffcdf8, args: argc=2, argv=0xffffce94
Locals at 0xffffcdf8, Previous frame's sp is 0xffffce00
Saved registers:
ebp at 0xffffcdf8, eip at 0xffffcdfc
(gdb) x/6x buffer
0xffffcde0: 0xf7fa23dc 0x080481fc 0x08048449 0x00000000
0xffffcdf0: 0xf7fa2000 0xf7fa2000
(gdb) x/x buffer+28
0xffffcdfc: 0xf7e07647
(gdb) c
Continuing.
Breakpoint 2, main (argc=2, argv=0xffffce94) at vuln.c:12
12 return 0;
(gdb) i f
Stack level 0, frame at 0xffffce00:
eip = 0x8048431 in main (vuln.c:12); saved eip = 0xf7e07647
source language c.
Arglist at 0xffffcdf8, args: argc=2, argv=0xffffce94
Locals at 0xffffcdf8, Previous frame's sp is 0xffffce00
Saved registers:
ebp at 0xffffcdf8, eip at 0xffffcdfc
(gdb) x/7x buffer
0xffffcde0: 0x34333231 0x38373635 0x32313039 0x36353433
0xffffcdf0: 0x30393837 0x34333231 0x00000035
(gdb) x/x buffer+28
0xffffcdfc: 0xf7e07647
(gdb) c
Continuing.
[Inferior 1 (process 6118) exited normally]
As expected, no segmentation fault here because I don't override the eip.
I don't understand this difference of behavior. In the two cases, the eip is not overriden. The only difference is the size of the stack. What's happening?
Additional information:
dumb_function
is not present$ gcc -v
gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)
$ uname -a
Linux pierre-Inspiron-5567 4.15.0-107-generic #108~16.04.1-Ubuntu SMP Fri Jun 12 02:57:13 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
You're not overwriting the saved eip, it's true. But you are overwriting a pointer that the function is using to find the saved eip. You can actually see this in your i f
output; look at "Previous frame's sp" and notice how the two low bytes are 00 35
; ASCII 0x35 is 5
and 00
is the terminating null. So although the saved eip is perfectly intact, the machine is fetching its return address from somewhere else, thus the crash.
In more detail:
GCC apparently doesn't trust the startup code to align the stack to 16 bytes, so it takes matters into its own hands (and $0xfffffff0,%esp
). But it needs to keep track of the previous stack pointer value, so that it can find its parameters and the return address when needed. This is the lea 0x4(%esp),%ecx
, which loads ecx with the address of the dword just above the saved eip on the stack. gdb calls this address "Previous frame's sp", I guess because it was the value of the stack pointer immediately before the caller executed its call main
instruction. I will call it P for short.
After aligning the stack, the compiler pushes -0x4(%ecx)
which is the argv
parameter from the stack, for easy access since it's going to need it later. Then it sets up its stack frame with push %ebp; mov %esp, %ebp
. We can keep track of all addresses relative to %ebp
from now on, in the way compilers usually do when not optimizing.
The push %ecx
a couple lines down stores the address P on the stack at offset -0x8(%ebp)
. The sub $0x20, %esp
makes 32 more bytes of space on the stack (ending at -0x28(%ebp)
), but the question is, where in that space does buffer
end up being placed? We see it happen after the call to dumb_function
, with lea -0x20(%ebp), %eax; push %eax
; this is the first argument to strcpy
being pushed, which is buffer
, so indeed buffer
is at -0x20(%ebp)
, not at -0x28
as you might have guessed. So when you write 24 (=0x18
) bytes there, you overwrite two bytes at -0x8(%ebp)
which is our stored P pointer.
It's all downhill from here. The corrupted value of P (call it Px) is popped into ecx, and just before the return, we do lea -0x4(%ecx), %esp
. Now %esp
is garbage and points somewhere bad, so the following ret
is sure to lead to trouble. Maybe Px
points to unmapped memory and just attempting to fetch the return address from there causes the fault. Maybe it points to readable memory, but the address fetched from that location does not point to executable memory, so the control transfer faults. Maybe the latter does point to executable memory, but the instructions located there are not the ones we want to be executing.
If you take out the call to dumb_function()
, the stack layout changes slightly. It's no longer necessary to push ebx around the call to dumb_function()
, so the P pointer from ecx now winds up at -4(%ebp)
, there are 4 bytes of unused space (to maintain alignment), and then buffer
is at -0x20(%ebp)
. So your two-byte overrun goes into space that's not used at all, hence no crash.
And here is the generated assembly with -mpreferred-stack-boundary=2
. Now there is no need to re-align the stack, because the compiler does trust the startup code to align the stack to at least 4 bytes (it would be unthinkable for this not to be the case). The stack layout is simpler: push ebp, and subtract 24 more bytes for buffer
. Thus your overrun overwrites two bytes of the saved ebp. This is eventually popped from the stack back into ebp, and so main
returns to its caller with a value in ebp that is
not the same as on entry. That's naughty, but it so happens that the system startup code doesn't use the value in ebp for anything (indeed in my tests it is set to 0 on entry to main, likely to mark the top of the stack for backtraces), and so nothing bad happens afterwards.
User contributions licensed under CC BY-SA 3.0