How to know about all the files which are compiled together to make an executable?

-2

We are looking for an procedure through which we can easily list down all the file which are compiled together to make an executable.

Use Case : Suppose, We have large repository and we want to know what all are the files existing in repository which are compiled to make an executable (i.e a.out)

For example :

dwarfdump a.out | grep "NS uri"
0x0000064a  [   9, 0] NS uri: "/home/main.c"
0x000006dd  [   2, 0] NS uri: "/home/zzzz.c"
0x000006f1  [   2, 0] NS uri: "/home/yyyy.c"
0x00000705  [   2, 0] NS uri: "/home/xxxx.c"
0x00000719  [   2, 0] NS uri: "/home/wwww.c"

but it doesn't listed down the all the header files. please suggest.

gdb
decompiler
asked on Stack Overflow Dec 7, 2017 by M Gem • edited Dec 8, 2017 by M Gem

1 Answer

3

How to Extract Source Code From Executable with Debug Symbol Available ?

You cannot do that. I guess you are on Linux/x86-64 (and your question is operating system and ABI specific, and debugging format specific). Of course, you should pass -g (or even -g3) to all the gcc compilation commands for your executable. Without that -g or -g3 option used to compile every translation unit (including perhaps those of shared libraries!) you might not have enough information.

Even with debug information in DWARF format, the ELF executable don't contain source code, but only references to source code (e.g. source file path, position as line and column numbers). So the debug information contains stuff like file src/foo.c, line 34 column 5 (but don't give anything about the content of src/foo.c near that position). Of course once gdb knows the file path src/foo.c it is able to read that source file (if available and up to date w.r.t. executable) so it can list it.

Extracting that debugging meta-data is a different question. Once you have understood DWARF you could use tools like objdump or readelf or addr2line or dwarfdump or libdwarf; and you could also script gdb (recent versions of GDB may be extendable in Python or in Guile) and use it on your ELF executable.

Perhaps you should consider Ian Taylor's libbacktrace. It uses the DWARF information to provide nice looking backtraces at runtime.

BTW, cgdb is (like ddd) only a front-end to gdb which does all the real work of processing that DWARF information. It is free software, you can study its source code.

i have only a.out then i want to list done file names

You might try dwarfdump -i | grep DW_AT_decl_file and you could use some GNU awk command instead of grep. You need to dive into the details of DWARF specifications and you need to understand more about the elf(5) format.

It doesn't listed down the all the header files

This is expected. Most header files don't contain any code, only declarations (e.g. printf is not implemented in <stdio.h> but in some C source file of your C standard library, e.g. in tree/src/stdio/printf.c if you use musl-libc; it is just declared in /usr/include/stdio.h). DWARF (and other debug information formats) are describing the binary code. And some header files get included only to give access to a few preprocessor macros (which get expanded or skipped at preprocessing time).

Maybe you dream of homoiconic programming languages, then try Common Lisp (e.g. with SBCL).

If your question is how to use gdb, then please read the Debugging with GDB manual.

If your question is about decompilers, be aware that it is an impossible task in general (e.g. because of Rice's theorem). BTW, programs inside most Linux distributions are generally free software, so it is quite easy to get the source code (and you could even avoid using proprietary software on Linux).

BTW, you could also do more things at compilation time by passing more flags to gcc. You might pass -H or -M (etc...) to gcc (in addition of -g). You could even consider writing your own GCC plugin to collect the information you want in some database (but that is probably not worth the effort). You could also consider improving your build automation (e.g. adding more into your Makefile) to collect such information. BTW, many large C programs use some metaprogramming techniques by having some .c files perhaps containing #line directives generated by tools (e.g. bison) or scripts, then what kind of file path do you want to keep ??

We are looking for an procedure through which we can easily list down all the files which are compiled together to make an executable.

If you are writing that executable and compiling it from its source code, I would suggest collecting that information at build time. It could be as trivial as passing some -M and/or -H flag to gcc, perhaps into some generated timestamp.c file (see this for inspiration; but your timestamp.c might contain information provided by gcc -M etc...). Your timestamp file might contain git version control metadata (like generated in this Makefile). Read also about reproducible builds and about package managers.

answered on Stack Overflow Dec 7, 2017 by Basile Starynkevitch • edited Dec 8, 2017 by Basile Starynkevitch

User contributions licensed under CC BY-SA 3.0