How do I diagnose differences in GNU ld linker behaviour over time?

Question

How do I diagnose differences in GNU ld linker behaviour over time?

I have a small x86-64 assembly program which I compiled and linked in 2018. I am now trying to reproduce the build, but at the point of linking I get different results in the final binaries.

Both files were assembled and linked using the following command:

$ nasm -f elf64 prng.asm; ld -s -o prng prng.o

The original ELF that I created in 2018 is named prng. The version I created today is named prng2. I have verified that the intermediate object files prng.o are identical so I'm ruling out the source code or nasm as the cause of the differences I'm seeing. Below I've shown the output from objdump on each of the ELFs, old and new:

Original:

$ objdump -x prng

prng:     file format elf64-x86-64
prng
architecture: i386:x86-64, flags 0x00000102:
EXEC_P, D_PAGED
start address 0x00000000004000b0

Program Header:
    LOAD off    0x0000000000000000 vaddr 0x0000000000400000 paddr 0x0000000000400000 align 2**21
         filesz 0x0000000000000150 memsz 0x0000000000000150 flags r-x
    LOAD off    0x0000000000000150 vaddr 0x0000000000600150 paddr 0x0000000000600150 align 2**21
         filesz 0x0000000000000008 memsz 0x0000000000000008 flags rw-

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         000000a0  00000000004000b0  00000000004000b0  000000b0  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .data         00000008  0000000000600150  0000000000600150  00000150  2**2
                  CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
no symbols

Latest:

$ objdump -x prng2

prng2:     file format elf64-x86-64
prng2
architecture: i386:x86-64, flags 0x00000102:
EXEC_P, D_PAGED
start address 0x0000000000401000

Program Header:
    LOAD off    0x0000000000000000 vaddr 0x0000000000400000 paddr 0x0000000000400000 align 2**12
         filesz 0x00000000000000e8 memsz 0x00000000000000e8 flags r--
    LOAD off    0x0000000000001000 vaddr 0x0000000000401000 paddr 0x0000000000401000 align 2**12
         filesz 0x00000000000000a0 memsz 0x00000000000000a0 flags r-x
    LOAD off    0x0000000000002000 vaddr 0x0000000000402000 paddr 0x0000000000402000 align 2**12
         filesz 0x0000000000000008 memsz 0x0000000000000008 flags rw-

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         000000a0  0000000000401000  0000000000401000  00001000  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .data         00000008  0000000000402000  0000000000402000  00002000  2**2
                  CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
no symbols

I can see that the difference seems to be down to the different alignments. However, I cannot determine what has caused different alignments to be used.

I'm using Ubuntu 20.04.1 today, whereas in 2018 I was using Ubuntu 16.04.
I'm using an AMD Ryzen 3700X CPU today, whereas in 2018 I was using an Intel Core i7-860.

I believe the version of ld will have changed between the two versions of Ubuntu. Is it likely that ld's behaviour on alignment changed during that time, e.g. with a different default linker script?

Or could the CPU be influencing choice of alignment values?

And why are there three sections of program header now, whereas there were only two before?

assembly

x86-64

ld

memory-alignment

asked on Stack Overflow Sep 2, 2020 by

jl6

1 Answer

Modern ld puts the .rodata section in a separate read-without-exec page. That requires putting it in a separate ELF segment (program header entry, read by the loader). Terminology: ELF sections are things listed in the Sections list, after the Program Header listing.

Older ld put .rodata into the same segment as .text, read-only with exec. This did change within the last couple years, like maybe 2018? (I've been using Arch GNU/Linux since about 2017, a rolling-release distro that mostly uses upstream sources unmodified, and it changed sometime around then IIRC.)

Older ld also had the ELF headers, and initializers for .data, in the same disk page as the start of .text. (For small files where .data and .text totalled less than 4k). This disk page was mapped 2 different ways: Read + Exec for the text segment, at the virtual address used for code and read-only data, and Read + Write for the data segment, used for .data.

Note the entry point address of 0x00000000004000b0 (some small offset from the start of a page, after the ELF headers + data) vs. 0x0000000000401000 page aligned in the new executable. Aligning data on disk allows mapping into virtual memory without overlap of anything into the executable segment that doesn't need to be executable. The natural consequence of that is page-aligned memory addresses, but that's a side-effect, not the goal.

Your executable file doesn't have a .rodata section (and neither does your input), but the ELF headers themselves are still mapped in a segment with the LOAD attribute (map into memory).

BTW, prefer using readelf, not objdump for examining ELF headers.

This change helps protect against ROP and Spectre attacks by not making constant data available as "gadgets" to jump to. (Now that most programs make code-injection impossible by ensuring W^X, more sophisticated attacks have to look for existing executable byte sequences. So the next step in hardening is making as few pages as possible executable that don't need to be.)

It has nothing to do with the CPU you're running on, or that you built on. As @old_timer points out, you shouldn't expect identical binaries from different versions of the toolchain. Changes to defaults like this are certainly possible for this or other reasons, or even for a tool to embed a tool version signature into the metadata somewhere. (Compilers like GCC do that, probably NASM and ld don't.)

You could build an old version of GNU binutils from source, or get an old ld from a binary package.

Or maybe write your own linker script that puts .rodata in the same program segment as .text. (I think ld works by having a default linker script; if you can find the default linker script in older ld sources, you might be able to use it with the current ld you have installed.)

answered on Stack Overflow Sep 2, 2020 by

Peter Cordes • edited Sep 3, 2020 by

Peter Cordes

User contributions licensed under CC BY-SA 3.0