I came across this useful feature in ELF binaries -- Build ID. "It ... is (normally) the SHA1 hash over all code sections in the ELF image." One can read it with GNU utility:
$ readelf -n /bin/bash
...
Displaying notes found at file offset 0x00000274 with length 0x00000024:
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: 54967822da027467f21e65a1eac7576dec7dd821
And I wonder if there is an easy way to recompute Build ID yourself? To check if it isn't corrupted etc.
So, I've got an answer from Mark. Since it is an up to date info, I post it here. But basically you guys are right. Indeed there is no tool for computing Build-ID, and the intentions of Build-ID are not (1) identification of the file contents, and not even (2) identification of the executable (code) part of it, but it is for (3) capturing "semantic meaning" of a build, which is the hard bit for formalization. (Numbers are for self-reference.)
Quote from the email:
-- "Is there a user tool recomputing the build-id from the file itself, to check if it's not corrupted/compromised somehow etc?" If you have time, maybe you could post an answer there?
Sorry, I don't have a stackoverflow account. But the answer is: No, there is no such tool because the precise way a build-id is calculated isn't specified. It just has to be universally unique. Even the precise length of the build-id isn't specified. There are various ways using different hashing algorithms a build-id could be calculated to get a universally unique value. And not all data might (still be) in the ELF file to recalculate it even if you knew how it was created originally.
Apparently, the intentions of Build-ID changed since the Fedora Feature page was written about it. And people's opinions diverge on what it is now. Maybe in your answer you could include status of Build-ID and what it is now as well?
I think things weren't very precisely formulated. If a tool changes the build that creates the ELF file so that it isn't a "semantically identical" binary anymore then it should get a new (recalculated) build-id. But if a tool changes something about the file that still results in a "semantically identical" binary then the build-id stays the same.
What isn't precisely defined is what "semantically identical binary" means. The intention is that it captures everything that a build was made from. So if the source files used to generate a binary are different then you expect different build-ids, even if the binary code produced might happen to be the same.
This is why when calculating the build-id of a file through a hash algorithm you use not just the (allocated) code sections, but also the debuginfo sections (which will contain references to the source file names).
But if you then for example strip the debuginfo out (and put it into a separate file) then that doesn't change the build-id (the file was still created from the same build).
This is also why, even if you knew the precise hashing algorithm used to calculate the build-id, you might not be able to recalculate the build-id. Because you might be missing some of the original data used in the hashing algorithm to calculate the build-id.
Feel free to share this answer with others.
Cheers,
Mark
Also, for people interested in debuginfo
(linux performance & tracing, anyone?), he mentioned a couple projects for managing them on Fedora:
The build ID is not a hash of the program, but rather a unique identifier for the build, and is to be considered just a "unique blob" — at least at some point it used to be defined as a hash of timestamp and absolute file path, but that's not a guarantee of stability either.
I wonder if there is an easy way to recompute Build ID yourself?
No, there isn't, by design.
The page you linked to itself links to the original description of what build-id is and what it's usable for. That pages says:
But I'd like to specify it explicitly as being a unique identifier good
only for matching, not any kind of checksum that can be verified against
the contents.
(There are external general means for content verification, and I don't
think debuginfo association needs to do that.)
Additional complications are: the linker can take any of:
--build-id
--build-id=sha1
--build-id=md5
--build-id=0xhexstring
So the build id is not necessarily an sha1 sum to begin with.
User contributions licensed under CC BY-SA 3.0