Code hangs randomly after 1hr/1day/30 days

0

I am using Altera Cyclone V FPGA with ARM 7 core, I am running a application with 7 thread with mutexes.

The application randomly hangs after 1hr or 1 Day or 1 Month, no defined time.

I ran strace when the application is running smoothly and it gives:

---------------------------------------------------------------------------------------------------------
-------------------- RUNNING / HEALTHY STATE
---------------------------------------------------------------------------------------------------------

    root@socfpga:~# strace -p 297 -f
Process 297 attached with 7 threads
[pid   311] recvfrom(6,  <unfinished ...>
[pid   297] nanosleep({0, 10000000},  <unfinished ...>
[pid   340] nanosleep({0, 500000000},  <unfinished ...>
......
[pid   339] <... nanosleep resumed> NULL) = 0
[pid   339] nanosleep({0, 500000000},  <unfinished ...>
[pid   297] <... nanosleep resumed> NULL) = 0
[pid   297] nanosleep({0, 10000000}, NULL) = 0
[pid   297] nanosleep({0, 10000000}, NULL) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
[pid   297] --- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
[pid   297] rt_sigreturn()              = -1 EINTR (Interrupted system call)
[pid   297] nanosleep({0, 10000000}, NULL) = 0
[pid   297] nanosleep({0, 10000000},  <unfinished ...>
[pid   340] <... nanosleep resumed> NULL) = 0
[pid   340] nanosleep({0, 500000000},  <unfinished ...>
[pid   297] <... nanosleep resumed> NULL) = 0
[pid   297] nanosleep({0, 10000000}, NULL) = 0
[pid   297] nanosleep({0, 10000000},  <unfinished ...>
[pid   339] <... nanosleep resumed> NULL) = 0
[pid   339] nanosleep({0, 500000000},  <unfinished ...>
[pid   297] <... nanosleep resumed> NULL) = 0
[pid   297] nanosleep({0, 10000000}, NULL) = 0
[pid   297] nanosleep({0, 10000000}, NULL) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
[pid   297] --- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
[pid   297] rt_sigreturn()              = -1 EINTR (Interrupted system call)
[pid   297] nanosleep({0, 10000000}, NULL) = 0
[pid   297] nanosleep({0, 10000000},  <unfinished ...>
[pid   340] <... nanosleep resumed> NULL) = 0
[pid   340] nanosleep({0, 500000000},  <unfinished ...>
[pid   297] <... nanosleep resumed> NULL) = 0
[pid   297] nanosleep({0, 10000000}, NULL) = 0
[pid   297] nanosleep({0, 10000000},  <unfinished ...>
[pid   339] <... nanosleep resumed> NULL) = 0
[pid   339] nanosleep({0, 500000000},  <unfinished ...>
[pid   297] <... nanosleep resumed> NULL) = 0
[pid   297] nanosleep({0, 10000000}, NULL) = 0
[pid   297] nanosleep({0, 10000000}, NULL) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
[pid   297] --- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
[pid   297] rt_sigreturn()              = -1 EINTR (Interrupted system call)
[pid   297] nanosleep({0, 10000000}, NULL) = 0
[pid   297] nanosleep({0, 10000000},  <unfinished ...>
[pid   340] <... nanosleep resumed> NULL) = 0
[pid   340] nanosleep({0, 500000000},  <unfinished ...>
[pid   297] <... nanosleep resumed> NULL) = 0
.......
[pid   297] nanosleep({0, 10000000}, NULL) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
[pid   297] --- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
[pid   297] gettimeofday({1495402377, 473913}, NULL) = 0
[pid   297] write(3, "20170521 21:32:57.473 INFO     d"..., 100) = 100
[pid   297] rt_sigreturn()              = -1 EINTR (Interrupted system call)
[pid   297] gettimeofday({1495402377, 474831}, NULL) = 0
[pid   297] write(3, "20170521 21:32:57.474 ERROR    d"..., 110) = 110
[pid   297] nanosleep({0, 10000000}, NULL) = 0
[pid   297] nanosleep({0, 10000000},  <unfinished ...>
[pid   340] <... nanosleep resumed> NULL) = 0
[pid   340] nanosleep({0, 500000000},  <unfinished ...>
[pid   297] <... nanosleep resumed> NULL) = 0
[pid   297] nanosleep({0, 10000000}, NULL) = 0
[pid   297] nanosleep({0, 10000000}, ^CProcess 297 detached
 <detached ...>
Process 309 detached
Process 310 detached
Process 311 detached
Process 312 detached
Process 339 detached
Process 340 detached

randomly the application hangs and the strace output is as follows:

---------------------------------------------------------------------------------------------------------
-------------------- HANG STATE
---------------------------------------------------------------------------------------------------------
root@socfpga:~# strace -p 297
Process 297 attached
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL^CProcess 297 detached
 <detached ...>
root@socfpga:~# strace -p 297 -f
Process 297 attached with 7 threads
[pid   312] futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid   340] futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid   339] futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid   311] futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid   310] futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid   309] futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid   297] futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid   312] <... futex resumed> )       = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[pid   312] --- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
[pid   312] rt_sigreturn()              = -1 EINTR (Interrupted system call)
[pid   312] futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[pid   312] --- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
[pid   312] rt_sigreturn()              = -1 EINTR (Interrupted system call)
[pid   312] futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[pid   312] --- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
[pid   312] rt_sigreturn()              = -1 EINTR (Interrupted system call)
[pid   312] futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[pid   312] --- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
[pid   312] rt_sigreturn()              = -1 EINTR (Interrupted system call)
[pid   312] futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[pid   312] --- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
[pid   312] rt_sigreturn()              = -1 EINTR (Interrupted system call)
[pid   312] futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[pid   312] --- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
[pid   312] rt_sigreturn()              = -1 EINTR (Interrupted system call)
[pid   312] futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[pid   312] --- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
[pid   312] rt_sigreturn()              = -1 EINTR (Interrupted system call)
[pid   312] futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[pid   312] --- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
[pid   312] rt_sigreturn()              = -1 EINTR (Interrupted system call)
[pid   312] futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL^CProcess 297 detached
Process 309 detached
Process 310 detached
Process 311 detached
Process 312 detached
 <detached ...>
Process 339 detached
Process 340 detached

root@socfpga:~# strace -p 310
Process 310 attached
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL^CProcess 310 detached
 <detached ...>
root@socfpga:~# strace -p 311
Process 311 attached
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL^CProcess 311 detached
 <detached ...>
root@socfpga:~# strace -p 312
Process 312 attached
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL
) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL

) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGRT_2 {si_signo=SIGRT_2, si_code=SI_TIMER, si_pid=0, si_uid=0, si_value=209660} ---
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL^CProcess 312 detached
 <detached ...>
root@socfpga:~# strace -p 339
Process 339 attached
futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL


^CProcess 339 detached
 <detached ...>



Quit anyway? (y or n) y
Detaching from program: /home/user/user/process_cc, process 297



root@socfpga:~# gdb -p 309
GNU gdb (Linaro GDB) 7.8-2014.09
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-angstrom-linux-gnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://bugs.linaro.org>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 309
warning: process 309 is a cloned process
Reading symbols from /home/user/user/process_cc...done.
Reading symbols from /usr/lib/liblog4c.so.3...done.
Loaded symbols for /usr/lib/liblog4c.so.3
Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux-armhf.so.3...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux-armhf.so.3
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
0x76f3ed50 in ?? () from /lib/libpthread.so.0
(gdb) bt
#0  0x76f3ed50 in ?? () from /lib/libpthread.so.0
#1  0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) quit
A debugging session is active.

    Inferior 1 [process 309] will be detached.

Quit anyway? (y or n) y
Detaching from program: /home/user/user/process_cc, process 309
root@socfpga:~# gdb -p 310
GNU gdb (Linaro GDB) 7.8-2014.09
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-angstrom-linux-gnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://bugs.linaro.org>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 310

warning: process 310 is a cloned process
Reading symbols from /home/user/user/process_cc...done.
Reading symbols from /usr/lib/liblog4c.so.3...done.
Loaded symbols for /usr/lib/liblog4c.so.3
Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux-armhf.so.3...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux-armhf.so.3
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
0x76f3ed50 in ?? () from /lib/libpthread.so.0
(gdb) bt
#0  0x76f3ed50 in ?? () from /lib/libpthread.so.0
#1  0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) 
#0  0x76f3ed50 in ?? () from /lib/libpthread.so.0
#1  0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) quit
A debugging session is active.

    Inferior 1 [process 310] will be detached.

Quit anyway? (y or n) y
Detaching from program: /home/user/user/process_cc, process 310


Memory layout.

    Start Addr   End Addr       Size     Offset objfile
       0x10000    0x23000    0x13000        0x0 /home/user/user/process_cc
       0x33000    0x34000     0x1000    0x13000 /home/user/user/process_cc
       0x34000   0x208000   0x1d4000        0x0 
     0x1eba000  0x1edb000    0x21000        0x0 [heap]
    0x73800000 0x73801000     0x1000        0x0 
    0x73801000 0x74000000   0x7ff000        0x0 [stack:340]
    0x74000000 0x74001000     0x1000        0x0 
    0x74001000 0x74800000   0x7ff000        0x0 [stack:339]
    0x74800000 0x74821000    0x21000        0x0 
    0x74821000 0x74900000    0xdf000        0x0 
    0x74900000 0x74921000    0x21000        0x0 
    0x74921000 0x74a00000    0xdf000        0x0 
    0x74a00000 0x74a21000    0x21000        0x0 
    0x74a21000 0x74b00000    0xdf000        0x0 
    0x74b64000 0x74b65000     0x1000        0x0 
    0x74b65000 0x75364000   0x7ff000        0x0 [stack:312]
    0x75364000 0x75365000     0x1000        0x0 
    0x75365000 0x75b64000   0x7ff000        0x0 [stack:311]
    0x75b64000 0x75b65000     0x1000        0x0 
    0x75b65000 0x76364000   0x7ff000        0x0 [stack:310]
    0x76364000 0x76365000     0x1000        0x0 
    0x76365000 0x76b64000   0x7ff000        0x0 [stack:309]
    0x76b64000 0x76d64000   0x200000 0xff200000 /dev/mem
    0x76d64000 0x76e89000   0x125000        0x0 /lib/libc-2.20.so
    0x76e89000 0x76e99000    0x10000   0x125000 /lib/libc-2.20.so
    0x76e99000 0x76e9b000     0x2000   0x125000 /lib/libc-2.20.so
    0x76e9b000 0x76e9c000     0x1000   0x127000 /lib/libc-2.20.so
    0x76e9c000 0x76e9f000     0x3000        0x0 
    0x76e9f000 0x76ea5000     0x6000        0x0 /lib/librt-2.20.so
    0x76ea5000 0x76eb4000     0xf000     0x6000 /lib/librt-2.20.so
---Type <return> to continue, or q <return> to quit---
    0x76eb4000 0x76eb5000     0x1000     0x5000 /lib/librt-2.20.so
    0x76eb5000 0x76eb6000     0x1000     0x6000 /lib/librt-2.20.so
    0x76eb6000 0x76f1f000    0x69000        0x0 /lib/libm-2.20.so
    0x76f1f000 0x76f2e000     0xf000    0x69000 /lib/libm-2.20.so
    0x76f2e000 0x76f2f000     0x1000    0x68000 /lib/libm-2.20.so
    0x76f2f000 0x76f30000     0x1000    0x69000 /lib/libm-2.20.so
    0x76f30000 0x76f44000    0x14000        0x0 /lib/libpthread-2.20.so
    0x76f44000 0x76f54000    0x10000    0x14000 /lib/libpthread-2.20.so
    0x76f54000 0x76f55000     0x1000    0x14000 /lib/libpthread-2.20.so
    0x76f55000 0x76f56000     0x1000    0x15000 /lib/libpthread-2.20.so
    0x76f56000 0x76f58000     0x2000        0x0 
    0x76f58000 0x76f6e000    0x16000        0x0 /usr/lib/liblog4c.so.3
    0x76f6e000 0x76f75000     0x7000    0x16000 /usr/lib/liblog4c.so.3
    0x76f75000 0x76f77000     0x2000    0x15000 /usr/lib/liblog4c.so.3
    0x76f77000 0x76f96000    0x1f000        0x0 /lib/ld-2.20.so
    0x76f99000 0x76f9b000     0x2000        0x0 
    0x76fa4000 0x76fa5000     0x1000        0x0 
    0x76fa5000 0x76fa6000     0x1000        0x0 [sigpage]
    0x76fa6000 0x76fa7000     0x1000    0x1f000 /lib/ld-2.20.so
    0x76fa7000 0x76fa8000     0x1000    0x20000 /lib/ld-2.20.so
    0x7e9f4000 0x7ea1a000    0x26000        0x0 [stack]
    0xffff0000 0xffff1000     0x1000        0x0 [vectors]

Please point out the reason for such behavior so that the issue can be resolved.

stack
mutex
deadlock
corrupt
futex
asked on Stack Overflow May 7, 2020 by faiz321

1 Answer

0

Based on the strace output, it looks like all 7 threads are waiting for the same mutex:

[pid   312] futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid   340] futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid   339] futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid   311] futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid   310] futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid   309] futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid   297] futex(0x1eba830, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>

Attaching to the process (not a specific thread) with gdb and executing thread apply all bt would be a good start to identify which thread holds the mutex those threads are waiting for. If it is not obvious from the output, switch to one of the threads that are waiting (t <gdb-thread-id>), select the frame before the pthread_mutex_lock call (s <frame-id>; assuming pthread mutexes are used) and determine the owner by executing print <pthread_mutex_t-ptr>->__data.__owner. If gdb cannot resolve __data.__owner, you need to identify the owner via print *((int*)(<pthread_mutex_t-ptr>)+2). Search for the ID in the info threads output or in the initial command output to identify the owning thread and its stack trace (t <gdb-thread-id> and bt).

Be sure to create a core dump for later analysis by executing generate in gdb (can be reopened later with gdb <executable> <core-dump>).

answered on Stack Overflow May 12, 2020 by horstr

User contributions licensed under CC BY-SA 3.0