Cannot load custom ELF executable in GDB

0

I am currently writing a compiler (http://curly-lang.org if you're curious), and have been encountering a strange bug when trying to run the generated ELF binaries on the latest Linux kernel. The same binaries run fine on older kernels (I've tried on several Ubuntu boxes, uname 4.4.0-1049-aws), but on my updated Arch box (uname 4.17.11-arch1), I can't even open them under GDB.

The error message given by GDB is During startup program terminated with signal SIGSEGV, Segmentation fault, which as I understand is indicative of a failure to load the program segments before the first instruction is ever run.

I compiled a minimal ELF executable with GCC/NASM to try and reproduce the problem, but the GCC-produced executable loads without a hitch, whereas my programs definitely do not.

Here are the printouts of readelf -a for both executables, for reference. The first is the program generated by my compiler :

$ readelf -a my-program
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x400040
  Start of program headers:          2400 (bytes into file)
  Start of section headers:          2568 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         3
  Size of section headers:           64 (bytes)
  Number of section headers:         6
  Section header string table index: 1

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1]                   STRTAB           0000000000000000  00000b88
       000000000000009e  0000000000000000           0     0     0
  [ 2] .init             PROGBITS         0000000000400040  00000040
       000000000000006b  0000000000000000  AX       0     0     0
  [ 3] .text             PROGBITS         00000000004000ab  000000ab
       0000000000000824  0000000000000000  AX       0     0     0
  [ 4] .data             PROGBITS         00000000008008cf  000008cf
       0000000000000091  0000000000000000  WA       0     0     0
  [ 5] .symtab           SYMTAB           0000000000000000  00000c26
       00000000000000c0  0000000000000018           1     8     0
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)

There are no section groups in this file.

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000040 0x0000000000400040 0x0000000000000000
                 0x000000000000006b 0x000000000000006b  R E    0x1000
  LOAD           0x00000000000000ab 0x00000000004000ab 0x0000000000000000
                 0x0000000000000824 0x0000000000000824  R E    0x1000
  LOAD           0x00000000000008cf 0x00000000008008cf 0x0000000000000000
                 0x0000000000000091 0x0000000000000091  RW     0x1000

 Section to Segment mapping:
  Segment Sections...
   00     .init 
   01     .text 
   02     .data 

There is no dynamic section in this file.

There are no relocations in this file.

The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported.

Symbol table '.symtab' contains 8 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 00000000004004e0     0 NOTYPE  LOCAL  HIDDEN     3 .text.argument
     1: 00000000004005a0     0 NOTYPE  LOCAL  HIDDEN     3 .text.constant
     2: 0000000000400270     0 NOTYPE  LOCAL  HIDDEN     3 .text.memextend-page
     3: 0000000000400210     0 NOTYPE  LOCAL  HIDDEN     3 .text.memextend-pool-32
     4: 00000000004002b0     0 NOTYPE  LOCAL  HIDDEN     3 .text.unit
     5: 00000000004005f0     0 NOTYPE  LOCAL  HIDDEN     3 .text.write
     6: 00000000008008d0     0 NOTYPE  LOCAL  HIDDEN     4 .data.brkaddr
     7: 0000000000400040     0 NOTYPE  LOCAL  HIDDEN     2 .init.brkaddr-init

No version information found in this file.

And for the GCC-generated program :

$ readelf -a gcc-program
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x400110
  Start of program headers:          64 (bytes into file)
  Start of section headers:          336 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         3
  Size of section headers:           64 (bytes)
  Number of section headers:         5
  Section header string table index: 4

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .note.gnu.build-i NOTE             00000000004000e8  000000e8
       0000000000000024  0000000000000000   A       0     0     4
  [ 2] .text             PROGBITS         0000000000400110  00000110
       0000000000000010  0000000000000000  AX       0     0     16
  [ 3] .data             PROGBITS         0000000000600120  00000120
       0000000000000001  0000000000000000  WA       0     0     4
  [ 4] .shstrtab         STRTAB           0000000000000000  00000121
       000000000000002a  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)

There are no section groups in this file.

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x0000000000000120 0x0000000000000120  R E    0x200000
  LOAD           0x0000000000000120 0x0000000000600120 0x0000000000600120
                 0x0000000000000001 0x0000000000000001  RW     0x200000
  NOTE           0x00000000000000e8 0x00000000004000e8 0x00000000004000e8
                 0x0000000000000024 0x0000000000000024  R      0x4

 Section to Segment mapping:
  Segment Sections...
   00     .note.gnu.build-id .text 
   01     .data 
   02     .note.gnu.build-id 

There is no dynamic section in this file.

There are no relocations in this file.

The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported.

No version information found in this file.

Displaying notes found in: .note.gnu.build-id
  Owner                 Data size   Description
  GNU                  0x00000014   NT_GNU_BUILD_ID (unique build ID bitstring)
    Build ID: 1a3e678b08996ee6a9d289c3f76c7c52cd4a30aa

As you can see, I've tried to mirror GCC's segment placement (~0x400000 for the code, and ~0x800000 for the data), and the two ELF headers are strictly identical. The only meaningful difference I can think of is that my custom binaries have two LOAD segments (one for the initialization code, one for the rest) that share the same page, whereas GCC only produces a single code LOAD segment. That shouldn't pose a problem, though, since they both share the same permissions and don't overlap.

Other than that, I do not see what could possibly keep the first program from loading correctly. If anyone well-versed in the arcanes of the Linux ELF loader could enlighten me, that would be greatly appreciated.

Thank you for your attention,

linux
gdb
elf
asked on Stack Overflow Aug 2, 2018 by Marc Coiffier • edited Aug 2, 2018 by Marc Coiffier

1 Answer

3

Nevermind everyone, it was the page-sharing segments that were causing the problem all along.

Thinking the problem was probably in the kernel loader, I should have thought about running dmesg much earlier, where I would have noticed the following message, clear as day :

    [54178.211348] 12766 (my-program): Uhuuh, elf segment at 0000000000400000 requested but the memory is mapped already

Apparently, some benevolent mastermind decided 3 months ago that it would be good to actually catch double-mapping errors instead of just letting them go silently as we always did in the ELF loader.

It's not that my binaries used to be correct, it's that the error they were causing wasn't caught before. I don't know if I should be proud or ashamed for my bugs to have eluded detection all this time.

Anyway, I leave this answer to warn anyone foolish enough to map multiple segments on a single page in an ELF binary : do not. There is no try.

PS: @rodrigo: Thanks for your answer, I didn't even notice the PhysAddr before you pointed them out. The manual says that they're used "on systems where physical addressing is relevant", which doesn't seem to be the case here, but I'll remember to keep a lookout for them next time.

answered on Stack Overflow Aug 2, 2018 by Marc Coiffier • edited Aug 3, 2018 by Marc Coiffier

User contributions licensed under CC BY-SA 3.0