workqueue hung during process blk_mq_run_work_fn

0

workqueue hung on my board(ARM-Linux).At the beginning, board can be connected with ssh.then, connection is ok, but it cannot enter prompt.and I catch some information with sysrq, sysrq info(partial) like this:

kworker/0:2H    R  running task        0 19783      2 0x00000028
Workqueue: kblockd blk_mq_run_work_fn
Call trace:
 __switch_to+0xf4/0x120
 __schedule+0x248/0x460
 preempt_schedule_common+0x24/0x4c
 preempt_schedule+0x28/0x30
 _raw_spin_unlock_irqrestore+0x30/0x4c
 __wake_up_common_lock+0x88/0xc4
 __wake_up+0x14/0x1c
 wake_up_bit+0x78/0xa0
 end_buffer_read_sync+0x44/0xa4
 end_bio_bh_io_sync+0x30/0x60
 bio_endio+0xdc/0x110
 blk_update_request+0xb8/0x250
 mtd_blktrans_work+0xdc/0x1a0
 mtd_queue_rq+0x50/0x84
 blk_mq_dispatch_rq_list+0xa8/0x43c
 blk_mq_do_dispatch_sched+0x78/0x110
 blk_mq_sched_dispatch_requests+0x118/0x190
 __blk_mq_run_hw_queue+0xc4/0x114
 blk_mq_run_work_fn+0x1c/0x24
 process_one_work+0x1c8/0x324
 worker_thread+0x68/0x3ac
 kthread+0x13c/0x150
 ret_from_fork+0x10/0x1c
ipc_Session2    D    0  8552   8441 0x00000000
Call trace:
 __switch_to+0xf4/0x120
 __schedule+0x248/0x460
 schedule+0x40/0xe0
 squashfs_cache_get+0x2f8/0x340
 squashfs_get_datablock+0x1c/0x24
 squashfs_readpage_block+0x34/0x90
 squashfs_readpage+0x240/0x27c
 read_pages.isra.0+0x118/0x180
 __do_page_cache_readahead+0x19c/0x1c0
 do_sync_mmap_readahead+0xcc/0x174
 filemap_fault+0x548/0x6e0
 __do_fault+0x38/0xfc
 do_fault+0xb4/0x1b0
 handle_pte_fault+0x68/0x19c
 __handle_mm_fault+0xcc/0x120
 handle_mm_fault+0x8c/0xd4
 do_page_fault+0x11c/0x3e0
 do_translation_fault+0xa4/0xb0
 do_mem_abort+0x3c/0xa0
 do_el0_ia_bp_hardening+0x3c/0xb0
 el0_ia+0x18/0x1c
ipc_Session3    D    0  8598   8441 0x00000000
Call trace:
 __switch_to+0xf4/0x120
 __schedule+0x248/0x460
 schedule+0x40/0xe0
 squashfs_cache_get+0x2f8/0x340
 squashfs_get_datablock+0x1c/0x24
 squashfs_readpage_block+0x34/0x90
 squashfs_readpage+0x240/0x27c
 read_pages.isra.0+0x118/0x180
 __do_page_cache_readahead+0x19c/0x1c0
 do_sync_mmap_readahead+0xcc/0x174
 filemap_fault+0x548/0x6e0
 __do_fault+0x38/0xfc
 do_fault+0xb4/0x1b0
 handle_pte_fault+0x68/0x19c
 __handle_mm_fault+0xcc/0x120
 handle_mm_fault+0x8c/0xd4
 do_page_fault+0x11c/0x3e0
 do_translation_fault+0xa4/0xb0
 do_mem_abort+0x3c/0xa0
 do_el0_ia_bp_hardening+0x3c/0xb0
 el0_ia+0x18/0x1c
...
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: vmstat_shepherd
workqueue events_power_efficient: flags=0x80
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=3/256 refcnt=4
    pending: phy_state_machine, neigh_periodic_work, do_cache_clean
workqueue mm_percpu_wq: flags=0x8
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: vmstat_update
workqueue writeback: flags=0x4a
  pwq 4: cpus=0-1 flags=0x4 nice=0 active=2/256 refcnt=4
    in-flight: 8294:wb_workfn wb_workfn
workqueue kblockd: flags=0x18                                        
  pwq 1: cpus=0 node=0 flags=0x0 nice=-20 active=2/256 refcnt=3      
    in-flight: 19783:blk_mq_run_work_fn
    pending: blk_mq_run_work_fn
workqueue mmc_complete: flags=0x18
  pwq 1: cpus=0 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
    pending: mmc_blk_mq_complete_work
pool 1: cpus=0 node=0 flags=0x0 nice=-20 hung=21394s workers=3 idle: 6724 1804  
pool 4: cpus=0-1 flags=0x4 nice=0 hung=0s workers=3 idle: 19890 12972

As shown above, pool 1 hungs 5.9 hours(21394s), it may be blk_mq_run_work_fn(most likely) or mmc_blk_mq_complete_work.And many of threads or processes are D state,as shown:

/usr/bin# top
Mem: 487160K used, 12976K free, 1172K shrd, 9344K buff, 51200K cached
CPU:   0% usr  54% sys   0% nic   0% idle  45% io   0% irq   0% sirq
Load average: 90.99 90.18 88.74 5/226 30760
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
 8445  8441 root     D     458m  94%  50% /usr/bin/app
30760 29801 root     R     3300   1%   5% top
 8444  8441 root     D     9688   2%   0% /usr/bin/Daemon
  329     1 root     S     3724   1%   0% /sbin/logd -S 1024
  384     1 root     S     3440   1%   0% /usr/sbin/crond -f -c /etc/crontabs -
  205     1 root     S     3440   1%   0% /bin/ash --login
29592     1 root     D     3440   1%   0% -ash
 8939     1 root     D     3440   1%   0% -ash
10680     1 root     D     3440   1%   0% -ash
 9210     1 root     D     3440   1%   0% -ash

Can anyone tell me why this happen,and how to deal with this problem? Thx

linux-kernel
hung
workqueue
asked on Stack Overflow Mar 8, 2021 by yao

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0