dnf install on CentOS 8 hangs an e2-medium instance

0

Trying to install a package or upgrade a CentOS 8 system makes an e2-medium instance unresponsive. This has happened to me repeatedly on multiple instances. Nothing peaks in monitoring for any prolonged period. The CPU load spikes briefly, but does not surpass 50 %, so does disk I/O, with ~7 MB/s read and ~150 read IOPS (still below the estimated performance limits for a 30 GB SSD persistent disk used for this instance). Memory utilization is constant at ~25 %. Changing the instance type to a n1-standard-1 helps and the installation/upgrade then proceeds speedily without errors.

Following can be found on the console output. Could this be a bug of the hypervisor? Surely an e2-medium instance should be powerful enough to install OS packages. CentOS 8.3.2011, fully up-to-date.

Jan 21 08:30:41 instancename systemd[1]: Starting dnf makecache...
Jan 21 08:30:55 instancename filebeat[1120]: 2021-01-21T08:30:55.871+0100#011INFO#011[monitoring]#011log/log.go:145#011Non-zero metrics in the last 30s#011{"monitoring": {"metrics": {"beat":{"cgroup":{"memory":{"mem":{"usage":{"bytes":131072}}}},"cpu":{"system":{"ticks":90,"time":{"ms":2}},"total":{"ticks":230,"time":{"ms":5},"value":230},"user":{"ticks":140,"time":{"ms":3}}},"handles":{"limit":{"hard":262144,"soft":1024},"open":11},"info":{"ephemeral_id":"a784951e-88c1-4222-a1ab-08363e8cf29c","uptime":{"ms":600277}},"memstats":{"gc_next":7335536,"memory_alloc":4873592,"memory_total":20972168},"runtime":{"goroutines":19}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":0}}},"registrar":{"states":{"current":2}},"system":{"load":{"1":5.45,"15":1.05,"5":2.54,"norm":{"1":2.725,"15":0.525,"5":1.27}}}}}}
[  615.060049] INFO: task kworker/u4:0:8 blocked for more than 120 seconds.
[  615.067009]       Not tainted 4.18.0-240.1.1.el8_3.x86_64 #1
[  615.072913] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  615.081381] kworker/u4:0    D    0     8      2 0x80004000
[  615.087209] Workqueue: writeback wb_workfn (flush-8:0)
[  615.092521] Call Trace:
[  615.095112]  __schedule+0x2a6/0x700
[  615.098757]  ? wbt_exit+0x30/0x30
[  615.102213]  ? __wbt_done+0x30/0x30
[  615.105939]  schedule+0x38/0xa0
[  615.109216]  io_schedule+0x12/0x40
[  615.112936]  rq_qos_wait+0xfa/0x170
[  615.116715]  ? karma_partition+0x210/0x210
[  615.120963]  ? wbt_exit+0x30/0x30
[  615.124428]  wbt_wait+0x99/0xe0
[  615.127725]  __rq_qos_throttle+0x23/0x30
[  615.131873]  blk_mq_make_request+0x125/0x5d0
[  615.136386]  generic_make_request+0xcf/0x310
[  615.140928]  submit_bio+0x3c/0x160
[  615.144530]  ? xfs_prepare_ioend+0xc2/0xd0 [xfs]
[  615.149505]  iomap_submit_ioend.isra.38+0x4a/0x70
[  615.154394]  xfs_vm_writepages+0x64/0x90 [xfs]
[  615.159159]  do_writepages+0x41/0xd0
[  615.162889]  __writeback_single_inode+0x3d/0x360
[  615.167730]  writeback_sb_inodes+0x1e3/0x450
[  615.172227]  __writeback_inodes_wb+0x5d/0xb0
[  615.177091]  wb_writeback+0x25f/0x2f0
[  615.181207]  ? cpumask_next+0x17/0x20
[  615.185015]  wb_workfn+0x333/0x3f0
[  615.188558]  ? __switch_to_asm+0x35/0x70
[  615.192712]  process_one_work+0x1a7/0x360
[  615.196889]  worker_thread+0x30/0x390
[  615.200814]  ? create_worker+0x1a0/0x1a0
[  615.204880]  kthread+0x112/0x130
[  615.208248]  ? kthread_flush_work_fn+0x10/0x10
[  615.212833]  ret_from_fork+0x35/0x40
[  615.216733] INFO: task auditd:679 blocked for more than 120 seconds.
[  615.223229]       Not tainted 4.18.0-240.1.1.el8_3.x86_64 #1
[  615.229020] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  615.237115] auditd          D    0   679      1 0x00000320
[  615.243158] Call Trace:
[  615.246033]  __schedule+0x2a6/0x700
[  615.249772]  ? mempool_alloc+0x67/0x190
[  615.254019]  ? wbt_exit+0x30/0x30
[  615.257534]  ? __wbt_done+0x30/0x30
[  615.261471]  schedule+0x38/0xa0
[  615.265307]  io_schedule+0x12/0x40
[  615.268972]  rq_qos_wait+0xfa/0x170
[  615.272856]  ? karma_partition+0x210/0x210
[  615.277098]  ? wbt_exit+0x30/0x30
[  615.280684]  wbt_wait+0x99/0xe0
[  615.283989]  __rq_qos_throttle+0x23/0x30
[  615.288152]  blk_mq_make_request+0x125/0x5d0
[  615.292609]  generic_make_request+0xcf/0x310
[  615.297120]  submit_bio+0x3c/0x160
[  615.300809]  ? xfs_prepare_ioend+0xc2/0xd0 [xfs]
[  615.305583]  iomap_submit_ioend.isra.38+0x4a/0x70
[  615.310626]  xfs_vm_writepages+0x64/0x90 [xfs]
[  615.315321]  do_writepages+0x41/0xd0
[  615.319228]  __filemap_fdatawrite_range+0xbe/0xf0
[  615.324076]  file_write_and_wait_range+0x4c/0xa0
Jan 21 08:30:56 [  615.328888]  xfs_file_fsync+0x5d/0x1e0 [xfs]
[  615.334687]  do_fsync+0x38/0x70
instancename ker[  615.338159]  __x64_sys_fsync+0x10/0x20
[  615.343417]  do_syscall_64+0x5b/0x1a0
nel: INFO: task [  615.347246]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[  615.353984] RIP: 0033:0x7f4555a00e57
kworker/u4:0:8 b[  615.357890] Code: Bad RIP value.
[  615.362766] RSP: 002b:00007f4553378d00 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
locked for more [  615.370709] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f4555a00e57
[  615.379456] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000006
[  615.386755] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f4553379700
[  615.394155] R10: 0000000000000000 R11: 0000000000000293 R12: 00007fffcdb9149e
[  615.401812] R13: 00007fffcdb9149f R14: 0000000000000000 R15: 00007f4553378e80
than 120 seconds[  615.409225] INFO: task kworker/1:3:1731 blocked for more than 120 seconds.
[  615.417605]       Not tainted 4.18.0-240.1.1.el8_3.x86_64 #1
[  615.423406] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  615.431453] kworker/1:3     D    0  1731      2 0x80004080
.
Jan 21 08:30:[  615.437185] Workqueue: xfs-sync/sda1 xfs_log_worker [xfs]
[  615.444063] Call Trace:
[  615.446658]  __schedule+0x2a6/0x700
56 instancename [  615.450278]  ? wbt_exit+0x30/0x30
[  615.455116]  ? __wbt_done+0x30/0x30
[  615.458836]  schedule+0x38/0xa0
kernel:      Not[  615.462118]  io_schedule+0x12/0x40
[  615.467115]  rq_qos_wait+0xfa/0x170
[  615.470771]  ? karma_partition+0x210/0x210
 tainted 4.18.0-[  615.474997]  ? wbt_exit+0x30/0x30
[  615.479902]  wbt_wait+0x99/0xe0
[  615.483175]  __rq_qos_throttle+0x23/0x30
[  615.487231]  blk_mq_make_request+0x125/0x5d0
[  615.491647]  generic_make_request+0xcf/0x310
240.1.1.el8_3.x8[  615.496049]  ? bio_add_page+0x42/0x50
[  615.501214]  submit_bio+0x3c/0x160
6_64 #1
Jan 21 [  615.504816]  xlog_state_release_iclog+0x5f/0xa0 [xfs]
08:30:56 instanc[  615.511398]  xfs_log_force+0x23c/0x2e0 [xfs]
[  615.517508]  ? __switch_to_asm+0x35/0x70
ename kernel: "e[  615.521564]  ? __switch_to_asm+0x41/0x70
[  615.527005]  ? finish_task_switch+0x77/0x2a0
cho 0 > /proc/sy[  615.531547]  xfs_log_worker+0x2f/0xf0 [xfs]
[  615.537289]  process_one_work+0x1a7/0x360
[  615.541569]  worker_thread+0x30/0x390
s/kernel/hung_ta[  615.545779]  ? create_worker+0x1a0/0x1a0
sk_timeout_secs"[  615.551370]  kthread+0x112/0x130
[  615.556120]  ? kthread_flush_work_fn+0x10/0x10
 disables this m[  615.560762]  ret_from_fork+0x35/0x40
essage.
Jan 21 08:30:56 instancename kernel: kworker/u4:0    D    0     8      2 0x80004000
Jan 21 08:30:56 instancename kernel: Workqueue: writeback wb_workfn (flush-8:0)
Jan 21 08:30:56 instancename kernel: Call Trace:
Jan 21 08:30:56 instancename kernel: __schedule+0x2a6/0x700
Jan 21 08:30:56 instancename kernel: ? wbt_exit+0x30/0x30
Jan 21 08:30:56 instancename kernel: ? __wbt_done+0x30/0x30
Jan 21 08:30:56 instancename kernel: schedule+0x38/0xa0
Jan 21 08:30:56 instancename kernel: io_schedule+0x12/0x40
Jan 21 08:30:56 instancename kernel: rq_qos_wait+0xfa/0x170
Jan 21 08:30:56 instancename kernel: ? karma_partition+0x210/0x210
Jan 21 08:30:56 instancename kernel: ? wbt_exit+0x30/0x30
Jan 21 08:30:56 instancename kernel: wbt_wait+0x99/0xe0
Jan 21 08:30:56 instancename kernel: __rq_qos_throttle+0x23/0x30
Jan 21 08:30:56 instancename kernel: blk_mq_make_request+0x125/0x5d0
Jan 21 08:30:56 instancename kernel: generic_make_request+0xcf/0x310
Jan 21 08:30:56 instancename kernel: submit_bio+0x3c/0x160
Jan 21 08:30:56 instancename kernel: ? xfs_prepare_ioend+0xc2/0xd0 [xfs]
Jan 21 08:30:56 instancename kernel: iomap_submit_ioend.isra.38+0x4a/0x70
Jan 21 08:30:56 instancename kernel: xfs_vm_writepages+0x64/0x90 [xfs]
Jan 21 08:30:56 instancename kernel: do_writepages+0x41/0xd0
Jan 21 08:30:56 instancename kernel: __writeback_single_inode+0x3d/0x360
Jan 21 08:30:56 instancename kernel: writeback_sb_inodes+0x1e3/0x450
Jan 21 08:30:56 instancename kernel: __writeback_inodes_wb+0x5d/0xb0
Jan 21 08:30:56 instancename kernel: wb_writeback+0x25f/0x2f0
Jan 21 08:30:56 instancename kernel: ? cpumask_next+0x17/0x20
Jan 21 08:30:56 instancename kernel: wb_workfn+0x333/0x3f0
Jan 21 08:30:56 instancename kernel: ? __switch_to_asm+0x35/0x70
Jan 21 08:30:56 instancename kernel: process_one_work+0x1a7/0x360
Jan 21 08:30:56 instancename kernel: worker_thread+0x30/0x390
Jan 21 08:30:56 instancename kernel: ? create_worker+0x1a0/0x1a0
Jan 21 08:30:56 instancename kernel: kthread+0x112/0x130
Jan 21 08:30:56 instancename kernel: ? kthread_flush_work_fn+0x10/0x10
Jan 21 08:30:56 instancename kernel: ret_from_fork+0x35/0x40
Jan 21 08:30:56 instancename kernel: INFO: task auditd:679 blocked for more than 120 seconds.
Jan 21 08:30:56 instancename kernel:      Not tainted 4.18.0-240.1.1.el8_3.x86_64 #1
Jan 21 08:30:56 instancename kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 21 08:30:56 instancename kernel: auditd          D    0   679      1 0x00000320
Jan 21 08:30:56 instancename kernel: Call Trace:
Jan 21 08:30:56 instancename kernel: __schedule+0x2a6/0x700
Jan 21 08:30:56 instancename kernel: ? mempool_alloc+0x67/0x190
Jan 21 08:30:56 instancename kernel: ? wbt_exit+0x30/0x30
Jan 21 08:30:56 instancename kernel: ? __wbt_done+0x30/0x30
Jan 21 08:30:56 instancename kernel: schedule+0x38/0xa0
Jan 21 08:30:56 instancename kernel: io_schedule+0x12/0x40
Jan 21 08:30:56 instancename kernel: rq_qos_wait+0xfa/0x170
Jan 21 08:30:56 instancename kernel: ? karma_partition+0x210/0x210
Jan 21 08:30:56 instancename kernel: ? wbt_exit+0x30/0x30
Jan 21 08:30:56 instancename kernel: wbt_wait+0x99/0xe0
Jan 21 08:30:56 instancename kernel: __rq_qos_throttle+0x23/0x30
Jan 21 08:30:56 instancename kernel: blk_mq_make_request+0x125/0x5d0
Jan 21 08:30:56 instancename kernel: generic_make_request+0xcf/0x310
Jan 21 08:30:56 instancename kernel: submit_bio+0x3c/0x160
Jan 21 08:30:56 instancename kernel: ? xfs_prepare_ioend+0xc2/0xd0 [xfs]
Jan 21 08:30:56 instancename kernel: iomap_submit_ioend.isra.38+0x4a/0x70
Jan 21 08:30:56 instancename kernel: xfs_vm_writepages+0x64/0x90 [xfs]
Jan 21 08:30:56 instancename kernel: do_writepages+0x41/0xd0
Jan 21 08:30:56 instancename kernel: __filemap_fdatawrite_range+0xbe/0xf0
Jan 21 08:30:56 instancename kernel: file_write_and_wait_range+0x4c/0xa0
Jan 21 08:30:56 instancename kernel: xfs_file_fsync+0x5d/0x1e0 [xfs]
Jan 21 08:30:56 instancename kernel: do_fsync+0x38/0x70
Jan 21 08:30:56 instancename kernel: __x64_sys_fsync+0x10/0x20
Jan 21 08:30:56 instancename kernel: do_syscall_64+0x5b/0x1a0
Jan 21 08:30:56 instancename kernel: entry_SYSCALL_64_after_hwframe+0x65/0xca
Jan 21 08:30:57 instancename kernel: RIP: 0033:0x7f4555a00e57
Jan 21 08:30:57 instancename kernel: Code: Bad RIP value.
Jan 21 08:30:57 instancename kernel: RSP: 002b:00007f4553378d00 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Jan 21 08:30:57 instancename kernel: RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f4555a00e57
Jan 21 08:30:57 instancename kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000006
Jan 21 08:30:57 instancename kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f4553379700
Jan 21 08:30:57 instancename kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 00007fffcdb9149e
Jan 21 08:30:57 instancename kernel: R13: 00007fffcdb9149f R14: 0000000000000000 R15: 00007f4553378e80
Jan 21 08:30:57 instancename kernel: INFO: task kworker/1:3:1731 blocked for more than 120 seconds.
Jan 21 08:30:57 instancename kernel:      Not tainted 4.18.0-240.1.1.el8_3.x86_64 #1
Jan 21 08:30:57 instancename kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 21 08:30:57 instancename kernel: kworker/1:3     D    0  1731      2 0x80004080
Jan 21 08:30:57 instancename kernel: Workqueue: xfs-sync/sda1 xfs_log_worker [xfs]
Jan 21 08:30:57 instancename kernel: Call Trace:
Jan 21 08:30:57 instancename kernel: __schedule+0x2a6/0x700
Jan 21 08:30:57 instancename kernel: ? wbt_exit+0x30/0x30
Jan 21 08:30:57 instancename kernel: ? __wbt_done+0x30/0x30
Jan 21 08:30:57 instancename kernel: schedule+0x38/0xa0
Jan 21 08:30:57 instancename kernel: io_schedule+0x12/0x40
Jan 21 08:30:57 instancename kernel: rq_qos_wait+0xfa/0x170
Jan 21 08:30:57 instancename kernel: ? karma_partition+0x210/0x210
Jan 21 08:30:57 instancename kernel: ? wbt_exit+0x30/0x30
Jan 21 08:30:57 instancename kernel: wbt_wait+0x99/0xe0
Jan 21 08:30:57 instancename kernel: __rq_qos_throttle+0x23/0x30
Jan 21 08:30:57 instancename kernel: blk_mq_make_request+0x125/0x5d0
Jan 21 08:30:57 instancename kernel: generic_make_request+0xcf/0x310
Jan 21 08:30:57 instancename kernel: ? bio_add_page+0x42/0x50
Jan 21 08:30:57 instancename kernel: submit_bio+0x3c/0x160
Jan 21 08:30:57 instancename kernel: xlog_state_release_iclog+0x5f/0xa0 [xfs]
Jan 21 08:30:57 instancename kernel: xfs_log_force+0x23c/0x2e0 [xfs]
Jan 21 08:30:57 instancename collectd[1464]: write_gcm: can not take infinite value
Jan 21 08:30:57 instancename collectd[1464]: write_gcm: wg_typed_value_create_from_value_t_inline failed for swap/percent/value! Continuing.
Jan 21 08:30:57 instancename collectd[1464]: write_gcm: can not take infinite value
Jan 21 08:30:57 instancename collectd[1464]: write_gcm: wg_typed_value_create_from_value_t_inline failed for swap/percent/value! Continuing.
Jan 21 08:30:57 instancename collectd[1464]: write_gcm: can not take infinite value
Jan 21 08:30:57 instancename collectd[1464]: write_gcm: wg_typed_value_create_from_value_t_inline failed for swap/percent/value! Continuing.
Jan 21 08:30:57 instancename kernel: ? __switch_to_asm+0x35/0x70
Jan 21 08:30:57 instancename kernel: ? __switch_to_asm+0x41/0x70
Jan 21 08:30:57 instancename kernel: ? finish_task_switch+0x77/0x2a0
Jan 21 08:30:57 instancename kernel: xfs_log_worker+0x2f/0xf0 [xfs]
Jan 21 08:30:57 instancename kernel: process_one_work+0x1a7/0x360
Jan 21 08:30:57 instancename kernel: worker_thread+0x30/0x390
Jan 21 08:30:57 instancename kernel: ? create_worker+0x1a0/0x1a0
Jan 21 08:30:57 instancename kernel: kthread+0x112/0x130
Jan 21 08:30:57 instancename kernel: ? kthread_flush_work_fn+0x10/0x10
Jan 21 08:30:57 instancename kernel: ret_from_fork+0x35/0x40
Jan 21 08:31:25 instancename filebeat[1120]: 2021-01-21T08:31:25.871+0100#011INFO#011[monitoring]#011log/log.go:145#011Non-zero metrics in the last 30s#011{"monitoring": {"metrics": {"beat":{"cgroup":{"memory":{"mem":{"usage":{"bytes":-126976}}}},"cpu":{"system":{"ticks":90,"time":{"ms":5}},"total":{"ticks":240,"time":{"ms":10},"value":240},"user":{"ticks":150,"time":{"ms":5}}},"handles":{"limit":{"hard":262144,"soft":1024},"open":11},"info":{"ephemeral_id":"a784951e-88c1-4222-a1ab-08363e8cf29c","uptime":{"ms":630279}},"memstats":{"gc_next":7327888,"memory_alloc":3678280,"memory_total":21368912},"runtime":{"goroutines":19}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":0}}},"registrar":{"states":{"current":2}},"system":{"load":{"1":6.46,"15":1.27,"5":3.06,"norm":{"1":3.23,"15":0.635,"5":1.53}}}}}}
Jan 21 08:31:55 instancename filebeat[1120]: 2021-01-21T08:31:55.871+0100#011INFO#011[monitoring]#011log/log.go:145#011Non-zero metrics in the last 30s#011{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":100,"time":{"ms":3}},"total":{"ticks":250,"time":{"ms":4},"value":250},"user":{"ticks":150,"time":{"ms":1}}},"handles":{"limit":{"hard":262144,"soft":1024},"open":11},"info":{"ephemeral_id":"a784951e-88c1-4222-a1ab-08363e8cf29c","uptime":{"ms":660278}},"memstats":{"gc_next":7327888,"memory_alloc":4167216,"memory_total":21857848},"runtime":{"goroutines":19}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":0}}},"registrar":{"states":{"current":2}},"system":{"load":{"1":7.22,"15":1.5,"5":3.56,"norm":{"1":3.61,"15":0.75,"5":1.78}}}}}}
(...)
google-cloud-platform
google-compute-engine
centos8
asked on Server Fault Jan 21, 2021 by Peter G

1 Answer

0

As per this piece of log:

Jan 21 08:30:57 instancename collectd[1464]: write_gcm: can not take infinite value
Jan 21 08:30:57 instancename collectd[1464]: write_gcm: 
wg_typed_value_create_from_value_t_inline failed for swap/percent/value! Continuing.

It is indicating that the VM instance does not have swap memory, hence this metric plugin tries to divide by 0.

first of all, go to /etc/stackdriver/collectd.conf and check that the following config is set up:

LoadPlugin swap
<Plugin "swap">
  ValuesPercentage true
</Plugin>

Then, run sudo service stackdriver-agent restart

Then run free -m to check the swap memory.

Hope it helps!

answered on Server Fault Feb 16, 2021 by BraveAdmin

User contributions licensed under CC BY-SA 3.0