gcc -fstack-limit-(symbol|register) what is "signal is raised at run time"

1

Docs

Generate code to ensure that the stack does not grow beyond a certain value, either the value of a register or the address of a symbol. If a larger stack is required, a signal is raised at run time. For most targets, the signal is raised before the stack overruns the boundary, so it is possible to catch the signal without taking special precautions.

For instance, if the stack starts at absolute address ‘0x80000000’ and grows downwards, you can use the flags -fstack-limit-symbol=__stack_limit and -Wl,--defsym,__stack_limit=0x7ffe0000 to enforce a stack limit of 128KB. Note that this may only work with the GNU linker.

You can locally override stack limit checking by using the no_stack_limit function attribute (see Function Attributes).

What is the "signal"? Does it require some OS interface? Should there be documentation as to what ID this signal has so that I can catch it?

gcc
stack-overflow
instrumentation
asked on Stack Overflow Dec 4, 2017 by Bob • edited Dec 4, 2017 by Bob

1 Answer

1

I have just now myself run into the same question.

For x86 the signal will be "SIGILL", for illegal instruction. This is because GCC maps the "trap_if" RTL primitive to the x86 ud2 instruction, which once executed will generate the signal.

For example, here is some source code:

#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>

#ifndef N
#define N 100000
#endif

register uintptr_t stack_limit asm ("r12");

void foo();

__attribute__((no_stack_limit))
int main(int argc, char **argv) {
  int ret = 0;
  stack_limit = (uintptr_t)(&ret - 1000);

  foo();
  return ret;
}

void foo()
{
  char *ch = (char *)alloca(N);
  printf("Now in foo\n");
}

Compiled with gcc -ggdb -O0 -fstack-limit-register=r12 -o stack-lim stack-lim.c -fstack-usage -DN=3922

Experimentally, -DN=3921 will work ok, and -DN=3922 will trigger it (on my machine), which is in the right ballpark considering how crude it is.

And here is the consequent disassembly in gdb. Notice the "ud2" instruction is conditionally executed by the jae before it.

   ...
B+>│0x555555554707 <foo+8>  mov    %fs:0x28,%rax
   │0x555555554710 <foo+17> mov    %rax,-0x8(%rbp)
   │0x555555554714 <foo+21> xor    %eax,%eax
   │0x555555554716 <foo+23> mov    $0x10,%eax
   │0x55555555471b <foo+28> sub    $0x1,%rax
   │0x55555555471f <foo+32> add    $0xf62,%rax
   │0x555555554725 <foo+38> mov    $0x10,%ecx
   │0x55555555472a <foo+43> mov    $0x0,%edx
   │0x55555555472f <foo+48> div    %rcx
   │0x555555554732 <foo+51> imul   $0x10,%rax,%rax
   │0x555555554736 <foo+55> mov    %rsp,%rdx
   │0x555555554739 <foo+58> sub    %r12,%rdx
   │0x55555555473c <foo+61> cmp    %rax,%rdx
   │0x55555555473f <foo+64> jae    0x555555554743 <foo+68>
   │0x555555554741 <foo+66> ud2
   │0x555555554743 <foo+68> sub    %rax,%rsp
   │0x555555554746 <foo+71> mov    %rsp,%rax
   ...

And here is the relevant snippet of the same assembly, annotated with RTL by using -fdump-rtl -dP options, like so:

$ gcc -ggdb -O0 -fstack-limit-register=r12 -S -o stack-lim.s stack-lim.c -fstack-usage -DN=3922 -dP -fdump-rtl-final

Notice the (trap_if) becomes ud2:

...
#(insn 10 39 11 2 (parallel [
#            (set (reg:DI 1 dx [94])
#                (minus:DI (reg:DI 1 dx [94])
#                    (reg:DI 41 r12)))
#            (clobber (reg:CC 17 flags))
#        ]) "stack-lim.c":24 274 {*subdi_1}
#     (nil))
        subq    %r12, %rdx      # 10    *subdi_1/1      [length = 3]
#(insn 11 10 12 2 (set (reg:CC 17 flags)
#        (compare:CC (reg:DI 1 dx [94])
#            (reg:DI 0 ax [92]))) "stack-lim.c":24 8 {*cmpdi_1}
#     (nil))
        cmpq    %rax, %rdx      # 11    *cmpdi_1/1      [length = 3]
#(jump_insn 12 11 31 2 (set (pc)
#        (if_then_else (geu (reg:CC 17 flags)
#                (const_int 0 [0]))
#            (label_ref 15)
#            (pc))) "stack-lim.c":24 627 {*jcc_1}
#     (nil)
# -> 15)
        jnb     .L5     # 12    *jcc_1  [length = 2]
#(insn 13 31 14 3 (trap_if (const_int 1 [0x1])
#        (const_int 6 [0x6])) "stack-lim.c":24 1005 {trap}
#     (nil))
        ud2     # 13    trap    [length = 2]
.L5:
#(insn 16 32 17 4 (parallel [
#            (set (reg/f:DI 7 sp)
#                (minus:DI (reg/f:DI 7 sp)
#                    (reg:DI 0 ax [92])))
#            (clobber (reg:CC 17 flags))
#        ]) "stack-lim.c":24 274 {*subdi_1}
#     (nil))
        subq    %rax, %rsp      # 16    *subdi_1/1      [length = 3]
#(insn 17 16 18 4 (set (reg:DI 0 ax [93])
#        (reg/f:DI 7 sp)) "stack-lim.c":24 81 {*movdi_internal}
#     (nil))
        movq    %rsp, %rax      # 17    *movdi_internal/4       [length = 3]
...

I suspect the reason it is not well documented is because it is machine-specific as to which signal gets emitted, and indeed if such functionality is implemented at all. Although I do agree that the behavior for each target ought to be documented.

answered on Stack Overflow Mar 3, 2020 by Maxim Blinov

User contributions licensed under CC BY-SA 3.0