Valgrind on MIPS Reports no Heap Usage

20

I'm using valgrind (v3.10.0) to hunt down a memory leak in a complex application (a heavily modified build of net-snmp) that is being built as part of a bigger software suite. I am sure there is a leak (the memory footprint of the application grows linearly without bound), but valgrind always reports the following upon termination.

==1139== HEAP SUMMARY:
==1139==     in use at exit: 0 bytes in 0 blocks
==1139==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==1139== 
==1139== All heap blocks were freed -- no leaks are possible

The total heap usage cannot be zero -- there are many, many calls to malloc and free throughout the application. Valgrind is still capable of finding "Invalid Write" errors.

The application in question is being compiled, along with other software packages, with a uclibc-gcc toolchain for the MIPS processor (uclibc v0.9.29) to be flashed onto an embedded device running a busybox (v1.17.2) linux shell. I am running valgrind directly on the device. I use the following options when launching Valgrind:

--tool=memcheck --leak-check=full --undef-value-errors=no --trace-children=yes

Basically, Valgrind doesn't detect any heap usage even though I've used the heap. Why might this be? Are any of my assumptions (below) wrong?


What I've Tried

Simple Test Program

I compiled the simple test program (using the same target and toolchain as the application above) from the Valgrind quick-start tutorial, to see if Valgrind would detect the leak. The final output was the same as above: no heap usage.

Linking Issues?

Valgrind documentation has the following to say on their FAQ:

If your program is statically linked, most Valgrind tools will only work well if they are able to replace certain functions, such as malloc, with their own versions. By default, statically linked malloc functions are not replaced. A key indicator of this is if Memcheck says "All heap blocks were freed -- no leaks are possible".

The above sounds exactly like my problem, so I checked to see that it's dynamically linked to the C libraries that contained malloc and free. I used the uclibc toolchain's custom ldd executable (I can't use the native linux ldd) and the output included the following lines:

libc.so.0 => not found (0x00000000)
/lib/ld-uClibc.so.0 => /lib/ld-uClibc.so.0 (0x00000000)

(The reason they're not found is because I'm running this on the x86 host device; the mips target device doesn't have an ldd executable.) Based on my understanding, malloc and free will be in one of these libraries, and they seem to be dynamically linked. I also did readelf and nm on the executable to confirm that the references to malloc and free are undefined (which is characteristic of a dynamically linked executable).

Additionally, I tried launching Valgrind with the --soname-synonyms=somalloc=NONE option as suggested by the FAQ.

LD_PRELOAD support?

As pointed out by commenters and answerers, Valgrind depends upon usage of LD_PRELOAD. It has been suggested that my toolchain doesn't support this feature. In order to confirm that it does, I followed this example to create a simple test library and load it (I replaced rand() with a function that just returns 42). The test worked, so it would seem that my target supports LD_PRELOAD just fine.

Elf Data

I'll also include some information from the readelf command which may be useful. Rather than a giant dump, I've trimmed things down to include only what may be relevant.

Dynamic section
  Tag        Type                         Name/Value
 0x00000001 (NEEDED)                     Shared library: [libnetsnmpagent.so.30]
 0x00000001 (NEEDED)                     Shared library: [libnetsnmpmibs.so.30]
 0x00000001 (NEEDED)                     Shared library: [libnetsnmp.so.30]
 0x00000001 (NEEDED)                     Shared library: [libgcc_s.so.1]
 0x00000001 (NEEDED)                     Shared library: [libc.so.0]
 0x0000000f (RPATH)                      Library rpath: [//lib]

Symbol table '.dynsym'
   Num:    Value  Size Type    Bind   Vis      Ndx Name
    27: 00404a40     0 FUNC    GLOBAL DEFAULT  UND free
    97: 00404690     0 FUNC    GLOBAL DEFAULT  UND malloc
c
memory-management
memory-leaks
mips
valgrind
asked on Stack Overflow Sep 30, 2014 by Woodrow Barlow • edited Oct 8, 2014 by Woodrow Barlow

3 Answers

10

First, let's do a real test to see whether something is statically linked.

$ ldd -v /bin/true
    linux-vdso.so.1 =>  (0x00007fffdc502000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0731e11000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f07321ec000)

    Version information:
    /bin/true:
        libc.so.6 (GLIBC_2.3) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.3.4) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.14) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.4) => /lib/x86_64-linux-gnu/libc.so.6
        libc.so.6 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libc.so.6
    /lib/x86_64-linux-gnu/libc.so.6:
        ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
        ld-linux-x86-64.so.2 (GLIBC_PRIVATE) => /lib64/ld-linux-x86-64.so.2

The second line in the output shows it is dynamically linked to libc, which is what contains malloc.

As for what might be going wrong, I can suggest four things:

  1. Perhaps it's not linked to normal libc, but to some other C library (e.g. uclibc) or something else valgrind is not expecting. The above test will show you exactly what it's linked to. In order for valgrind to work, it uses LD_PRELOAD to wrap the malloc() and free() functions (description of general function wrapping here). If your libc substitute doesn't support LD_PRELOAD or (somehow) the C library's malloc() and free() aren't being used at all (with those names), then valgrind is not going to work. Perhaps you could include the link line used when you build your application.

  2. It is leaking, but it's not allocating memory using malloc(). For instance, it might (unlikely) be doing its own calls to brk(), or (more likely) allocating memory with mmap. You can use this to find out (this was a dump of cat itself).

.

$ cat /proc/PIDNUMBERHERE/maps
00400000-0040b000 r-xp 00000000 08:01 805303                             /bin/cat
0060a000-0060b000 r--p 0000a000 08:01 805303                             /bin/cat
0060b000-0060c000 rw-p 0000b000 08:01 805303                             /bin/cat
02039000-0205a000 rw-p 00000000 00:00 0                                  [heap]
7fbc8f418000-7fbc8f6e4000 r--p 00000000 08:01 1179774                    /usr/lib/locale/locale-archive
7fbc8f6e4000-7fbc8f899000 r-xp 00000000 08:01 1573024                    /lib/x86_64-linux-gnu/libc-2.15.so
7fbc8f899000-7fbc8fa98000 ---p 001b5000 08:01 1573024                    /lib/x86_64-linux-gnu/libc-2.15.so
7fbc8fa98000-7fbc8fa9c000 r--p 001b4000 08:01 1573024                    /lib/x86_64-linux-gnu/libc-2.15.so
7fbc8fa9c000-7fbc8fa9e000 rw-p 001b8000 08:01 1573024                    /lib/x86_64-linux-gnu/libc-2.15.so
7fbc8fa9e000-7fbc8faa3000 rw-p 00000000 00:00 0
7fbc8faa3000-7fbc8fac5000 r-xp 00000000 08:01 1594541                    /lib/x86_64-linux-gnu/ld-2.15.so
7fbc8fca6000-7fbc8fca9000 rw-p 00000000 00:00 0
7fbc8fcc3000-7fbc8fcc5000 rw-p 00000000 00:00 0
7fbc8fcc5000-7fbc8fcc6000 r--p 00022000 08:01 1594541                    /lib/x86_64-linux-gnu/ld-2.15.so
7fbc8fcc6000-7fbc8fcc8000 rw-p 00023000 08:01 1594541                    /lib/x86_64-linux-gnu/ld-2.15.so
7fffe1674000-7fffe1695000 rw-p 00000000 00:00 0                          [stack]
7fffe178d000-7fffe178f000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Note whether the end address of [heap] is actually growing, or whether you are seeing additional mmap entries. Another good indicator of whether valgrind is working is to send a SIGSEGV or similar to the process and see whether you see heap in use on exit.

  1. It isn't leaking in the strict sense, but it is leaking to all intents and purposes. For instance, perhaps it has datastructure (like a cache), which grows over time. On exit, the program (correctly) frees all entries in the cache. So, on exit, nothing is in use on the heap. In this instance, you'll want to know what is growing. This is a harder proposition. I'd use the technique to kill the program (above), capture the output, and post-process it. If you see 500 things after 24 hours, 1,000 after 48 hours, and 1,500 after 72 hours, that should give you an indication of what is 'leaking'. However, as haris points out in the comments, whilst this would result in the memory not being shown as leaks, it doesn't explain the 'total heap usage' being zero, as this describes the total allocations made and freed.

  2. Perhaps valgrind is just not working on your platform. What happens if you build a very simple program like the one below, and run valgrind on it on your platform? If this isn't working, you need to find out why valgrind is not operating right. Note that valgrind on MIPS is pretty new. Here is an email thread where a developer with MIPS and uclibc discovers valgrind is not reporting any allocations. His solution is to replace ntpl with linuxthreads.

.

#include <stdio.h>
#include <stdlib.h>
int
main (int argc, char **argv)
{
  void *p = malloc (100);       /* does not leak */
  void *q = malloc (100);       /* leaks */
  free (p);
  exit (0);
}
answered on Stack Overflow Oct 4, 2014 by abligh • edited Oct 5, 2014 by abligh
5

(Adding another answer as the question itself has changed substantially after OP awarded the first bounty)

Based on my understanding of your edits, you have now:

  1. Replicated the problem with valgrind's own test program
  2. Confirmed the test program binary is dynamically linked to uclibc
  3. Confirmed LD_PRELOAD is working on your system
  4. Confirmed (if only by using the test program) that this isn't symbol interference from another library

To me, that indicates that valgrind has a bug or is incompatible with your toolchain. I found references to say it should work with your tool-chain, so that implies to me there is a bug either way.

I suggest therefore that you report a bug using the mechanism described here. Perhaps leave out the bit about your complicated application, and just point out the simple test program doesn't work. If you haven't already, you might try the users mailing list as described here.

answered on Stack Overflow Oct 13, 2014 by abligh
1

In order to confirm that the executable is not statically linked, I ran file snmpd

Your problem is most likely not that the binary is statically linked (you now know it is not), but that malloc and free are statically linked into it (perhaps you are using alternative malloc implementation, such as tcmalloc?).

When you built the simple test case (on which Valgrind worked correctly), you likely didn't use the same link command line (and the same libraries) as your real application does.

In any case, it is trivial to check:

readelf -Ws snmpd | grep ' malloc'

If this shows UND (i.e. undefined), the Valgrind should have no trouble intercepting it. But chances are it shows FUNC GLOBAL DEFAULT ... malloc instead, which means that your snmpd is as good as statically linked as far as valgrind is concerned.

Assuming my guess is correct, relink snmpd with -Wl,-y,malloc flag. That will tell you which library defines your malloc. Remove it from the link, find and fix the leak, then decide whether having that library is worth the trouble it has caused you.

answered on Stack Overflow Oct 6, 2014 by Employed Russian

User contributions licensed under CC BY-SA 3.0