Yocto system lockup: rcu_preempt detected stalls on CPUs/tasks

0

I'm dealing with an intermittent lockup issue on an embedded Yocto system. (4.1.15-2.0.0).

We have been able to, over the course of a few weeks, reproduce several times across several units but not consistently. If we are able to produce a lockup, it usually takes several days. System becomes unresponsive via serial port/ethernet, does not respond to pings, and a service we created just to blink an LED dies as well.

I am figuring it's a kernel/driver lock of some sort, and we were finally able to extract a dmesg log from a unit via ssh until point of ssh dying:

pastebin

I've included the entire dmesg log as it captures the boot as well but the interesting bit starts at 307362 after the audit messages. We're seeing

[307362.408117] INFO: rcu_preempt detected stalls on CPUs/tasks:
[307362.412514]     (detected by 1, t=2102 jiffies, g=12684671, c=12684670, q=711)
[307362.418223] All QSes seen, last rcu_preempt kthread activity 2101 (30706237-30704136), jiffies_till_next_fqs=1, root ->qsmask 0x0
[307362.428582] cfinteractive   R running      0    38      2 0x00000000

followed by a backtrace. This repeats a few times with an extra message:

[307362.430710] rcu_preempt kthread starved for 2101 jiffies!

before the connection eventually dies and the system presumably locks.

What I'm taking from these messages is that a kernel thread is starved due to something... priority inversion perhaps?

How do I go about chasing down the addresses listed in the backtrace? And does anyone have a suggested direction for tackling this issue?

linux-kernel
embedded-linux
yocto
asked on Stack Overflow Sep 17, 2019 by jpsalm

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0