rcu_sched detected stalls on CPUs/tasks

Question

rcu_sched detected stalls on CPUs/tasks

I'm running many VMs using Virtualbox. These VMs use Debian 10.3 (the latest version). And I'm experiencing bugs / freezes as you can see below. It looks like this is happening on the VMs where I connected USB devices (Wifi USB dongles) in Virtualbox : i'm disconnected from the SSH connection and the VM freezes.

I'm a newbie and I don't know where it comes from. Is it the kernel, the distrib ?

As I can see it's a CPU problem. I always allocate 6 CPUs on my VMs (I have a Ryzen 5 3600), and 2 or 4Go RAM (I have 16Go on my host).

From dmesg :

[   61.290365] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[   61.290391] rcu:     4-...!: (16 GPs behind) idle=4cc/0/0x0 softirq=1782/1782 fqs=1
[   61.290408] rcu:     (detected by 2, t=5282 jiffies, g=633, q=71)
[   61.290424] Sending NMI from CPU 2 to CPUs 4:
[   61.290471] NMI backtrace for cpu 4 skipped: idling at native_safe_halt+0xe/0x10
[   61.291424] rcu: rcu_sched kthread starved for 5244 jiffies! g633 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=4
[   61.291452] rcu: RCU grace-period kthread stack dump:
[   61.291467] rcu_sched       I    0    10      2 0x80000000
[   61.291468] Call Trace:
[   61.291475]  ? __schedule+0x2a2/0x870
[   61.291476]  schedule+0x28/0x80
[   61.291478]  schedule_timeout+0x16b/0x390
[   61.291480]  ? __next_timer_interrupt+0xc0/0xc0
[   61.291483]  rcu_gp_kthread+0x40d/0x850
[   61.291484]  ? call_rcu_sched+0x20/0x20
[   61.291486]  kthread+0x112/0x130
[   61.291487]  ? kthread_bind+0x30/0x30
[   61.291488]  ret_from_fork+0x35/0x40
[   82.349534] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[   82.349560] rcu:     0-...!: (1 GPs behind) idle=f3c/0/0x0 softirq=924/924 fqs=0
[   82.349581] rcu:     4-...!: (0 ticks this GP) idle=558/0/0x0 softirq=1782/1782 fqs=0
[   82.349599] rcu:     5-...!: (13 GPs behind) idle=204/0/0x0 softirq=864/864 fqs=0
[   82.349616] rcu:     (detected by 3, t=5259 jiffies, g=637, q=198)
[   82.349633] Sending NMI from CPU 3 to CPUs 0:
[   82.349673] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x10
[   82.350631] Sending NMI from CPU 3 to CPUs 4:
[   82.350656] NMI backtrace for cpu 4 skipped: idling at native_safe_halt+0xe/0x10
[   82.351628] Sending NMI from CPU 3 to CPUs 5:
[   82.351654] NMI backtrace for cpu 5 skipped: idling at native_safe_halt+0xe/0x10
[   82.352627] rcu: rcu_sched kthread starved for 5259 jiffies! g637 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=4
[   82.352652] rcu: RCU grace-period kthread stack dump:
[   82.352664] rcu_sched       I    0    10      2 0x80000000
[   82.352666] Call Trace:
[   82.352670]  ? __schedule+0x2a2/0x870
[   82.352671]  schedule+0x28/0x80
[   82.352672]  schedule_timeout+0x16b/0x390
[   82.352675]  ? __next_timer_interrupt+0xc0/0xc0
[   82.352676]  rcu_gp_kthread+0x40d/0x850
[   82.352678]  ? call_rcu_sched+0x20/0x20
[   82.352679]  kthread+0x112/0x130
[   82.352680]  ? kthread_bind+0x30/0x30
[   82.352681]  ret_from_fork+0x35/0x40

From /var/log/syslog

May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.290365] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.290391] rcu:      4-...!: (16 GPs behind) idle=4cc/0/0x0 softirq=1782/1782 fqs=1
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.290408] rcu:      (detected by 2, t=5282 jiffies, g=633, q=71)
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.290424] Sending NMI from CPU 2 to CPUs 4:
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.290471] NMI backtrace for cpu 4 skipped: idling at native_safe_halt+0xe/0x10
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291424] rcu: rcu_sched kthread starved for 5244 jiffies! g633 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=4
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291452] rcu: RCU grace-period kthread stack dump:
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291467] rcu_sched       I    0    10      2 0x80000000
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291468] Call Trace:
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291475]  ? __schedule+0x2a2/0x870
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291476]  schedule+0x28/0x80
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291478]  schedule_timeout+0x16b/0x390
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291480]  ? __next_timer_interrupt+0xc0/0xc0
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291483]  rcu_gp_kthread+0x40d/0x850
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291484]  ? call_rcu_sched+0x20/0x20
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291486]  kthread+0x112/0x130
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291487]  ? kthread_bind+0x30/0x30
May 15 11:09:31 165-netshaper-deb-1030 kernel: [   61.291488]  ret_from_fork+0x35/0x40

Can anyone help me please ? I don't know where it comes from and how to resolve this problem.

[EDIT] I just completely reinstalled Windows 10 Pro on my computed (from an ISO file), then I installed Vbox, and I'm still experiencing CPU issues in my VMs. There are no USB devices connected to the VM. I'm now using Debian 10.4 here.

May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.265632] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.265657] rcu:         5-...!: (8 GPs behind) idle=6f4/0/0x0 softirq=701/701 fqs=1
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.265674] rcu:         (detected by 3, t=5261 jiffies, g=525, q=71)
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.265690] Sending NMI from CPU 3 to CPUs 5:
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.265716] NMI backtrace for cpu 5 skipped: idling at native_safe_halt+0xe/0x10
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266688] rcu: rcu_sched kthread starved for 5208 jiffies! g525 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=5
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266711] rcu: RCU grace-period kthread stack dump:
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266723] rcu_sched       I    0    10      2 0x80000000
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266725] Call Trace:
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266729]  ? __schedule+0x2a2/0x870
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266731]  schedule+0x28/0x80
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266732]  schedule_timeout+0x16b/0x390
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266734]  ? __next_timer_interrupt+0xc0/0xc0
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266735]  rcu_gp_kthread+0x40d/0x850
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266737]  ? call_rcu_sched+0x20/0x20
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266738]  kthread+0x112/0x130
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266739]  ? kthread_bind+0x30/0x30
May 19 12:02:24 102-ansible-deploy-deb-1040 kernel: [   53.266740]  ret_from_fork+0x35/0x40
May 19 12:02:48 102-ansible-deploy-deb-1040 systemd-timesyncd[293]: Synchronized to time server for the first time 51.159.6.183:123 (2.debian.pool.ntp.org).
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.971956] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.971984] rcu:         5-...!: (26 GPs behind) idle=cd8/0/0x0 softirq=717/717 fqs=1
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.972005] rcu:         (detected by 2, t=5252 jiffies, g=737, q=72)
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.972024] Sending NMI from CPU 2 to CPUs 5:
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.972052] NMI backtrace for cpu 5 skipped: idling at native_safe_halt+0xe/0x10
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973023] rcu: rcu_sched kthread starved for 5174 jiffies! g737 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=5
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973047] rcu: RCU grace-period kthread stack dump:
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973059] rcu_sched       I    0    10      2 0x80000000
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973060] Call Trace:
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973065]  ? __schedule+0x2a2/0x870
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973066]  schedule+0x28/0x80
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973067]  schedule_timeout+0x16b/0x390
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973070]  ? __next_timer_interrupt+0xc0/0xc0
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973071]  rcu_gp_kthread+0x40d/0x850
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973073]  ? call_rcu_sched+0x20/0x20
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973074]  kthread+0x112/0x130
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973075]  ? kthread_bind+0x30/0x30
May 19 12:03:23 102-ansible-deploy-deb-1040 kernel: [   96.973076]  ret_from_fork+0x35/0x40

In VBox settings, in System, Processor, I tried to activate PAE/NX and VT-x/AMD-v, but it changes nothing. I'm gonna try with Ubuntu and see if the problem is still happening.

[EDIT]

Looks like the problem isn't happening on Ubuntu.

[EDIT 26/05/2020]

Looks like these bugs have nothing to deal with USB devices connected to the VMs.

linux

virtualbox

debian

cpu

kernel

asked on Super User May 15, 2020 by

dmsakl • edited May 26, 2020 by

dmsakl

0 Answers

Nobody has answered this question yet.

User contributions licensed under CC BY-SA 3.0