I have a small x86-64 assembly program which I compiled and linked in 2018. I am now trying to reproduce the build, but at the point of linking I get different results in the final binaries.
Both files were assembled and linked using the following command:
$ nasm -f elf64 prng.asm; ld -s -o prng prng.o
The original ELF that I created in 2018 is named prng
. The version I created today is named prng2
. I have verified that the intermediate object files prng.o
are identical so I'm ruling out the source code or nasm as the cause of the differences I'm seeing. Below I've shown the output from objdump
on each of the ELFs, old and new:
Original:
$ objdump -x prng
prng: file format elf64-x86-64
prng
architecture: i386:x86-64, flags 0x00000102:
EXEC_P, D_PAGED
start address 0x00000000004000b0
Program Header:
LOAD off 0x0000000000000000 vaddr 0x0000000000400000 paddr 0x0000000000400000 align 2**21
filesz 0x0000000000000150 memsz 0x0000000000000150 flags r-x
LOAD off 0x0000000000000150 vaddr 0x0000000000600150 paddr 0x0000000000600150 align 2**21
filesz 0x0000000000000008 memsz 0x0000000000000008 flags rw-
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 000000a0 00000000004000b0 00000000004000b0 000000b0 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .data 00000008 0000000000600150 0000000000600150 00000150 2**2
CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
no symbols
Latest:
$ objdump -x prng2
prng2: file format elf64-x86-64
prng2
architecture: i386:x86-64, flags 0x00000102:
EXEC_P, D_PAGED
start address 0x0000000000401000
Program Header:
LOAD off 0x0000000000000000 vaddr 0x0000000000400000 paddr 0x0000000000400000 align 2**12
filesz 0x00000000000000e8 memsz 0x00000000000000e8 flags r--
LOAD off 0x0000000000001000 vaddr 0x0000000000401000 paddr 0x0000000000401000 align 2**12
filesz 0x00000000000000a0 memsz 0x00000000000000a0 flags r-x
LOAD off 0x0000000000002000 vaddr 0x0000000000402000 paddr 0x0000000000402000 align 2**12
filesz 0x0000000000000008 memsz 0x0000000000000008 flags rw-
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 000000a0 0000000000401000 0000000000401000 00001000 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .data 00000008 0000000000402000 0000000000402000 00002000 2**2
CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
no symbols
I can see that the difference seems to be down to the different alignments. However, I cannot determine what has caused different alignments to be used.
I believe the version of ld
will have changed between the two versions of Ubuntu. Is it likely that ld's behaviour on alignment changed during that time, e.g. with a different default linker script?
Or could the CPU be influencing choice of alignment values?
And why are there three sections of program header now, whereas there were only two before?
Modern ld
puts the .rodata
section in a separate read-without-exec page. That requires putting it in a separate ELF segment (program header entry, read by the loader). Terminology: ELF sections are things listed in the Sections list, after the Program Header listing.
Older ld
put .rodata
into the same segment as .text
, read-only with exec. This did change within the last couple years, like maybe 2018? (I've been using Arch GNU/Linux since about 2017, a rolling-release distro that mostly uses upstream sources unmodified, and it changed sometime around then IIRC.)
Older ld
also had the ELF headers, and initializers for .data
, in the same disk page as the start of .text
. (For small files where .data and .text totalled less than 4k). This disk page was mapped 2 different ways: Read + Exec for the text segment, at the virtual address used for code and read-only data, and Read + Write for the data segment, used for .data
.
Note the entry point address of 0x00000000004000b0
(some small offset from the start of a page, after the ELF headers + data) vs. 0x0000000000401000
page aligned in the new executable. Aligning data on disk allows mapping into virtual memory without overlap of anything into the executable segment that doesn't need to be executable. The natural consequence of that is page-aligned memory addresses, but that's a side-effect, not the goal.
Your executable file doesn't have a .rodata
section (and neither does your input), but the ELF headers themselves are still mapped in a segment with the LOAD attribute (map into memory).
BTW, prefer using readelf
, not objdump
for examining ELF headers.
This change helps protect against ROP and Spectre attacks by not making constant data available as "gadgets" to jump to. (Now that most programs make code-injection impossible by ensuring W^X, more sophisticated attacks have to look for existing executable byte sequences. So the next step in hardening is making as few pages as possible executable that don't need to be.)
It has nothing to do with the CPU you're running on, or that you built on. As @old_timer points out, you shouldn't expect identical binaries from different versions of the toolchain. Changes to defaults like this are certainly possible for this or other reasons, or even for a tool to embed a tool version signature into the metadata somewhere. (Compilers like GCC do that, probably NASM and ld
don't.)
You could build an old version of GNU binutils from source, or get an old ld from a binary package.
Or maybe write your own linker script that puts .rodata
in the same program segment as .text
. (I think ld
works by having a default linker script; if you can find the default linker script in older ld sources, you might be able to use it with the current ld you have installed.)
User contributions licensed under CC BY-SA 3.0