ZFSonLinux : process stuck in D state, iowait for CPU 100%, HDD idle

0

I have a problem that occurs regularly,cca every 5 seconds. the rtorrent process gets stuck on disk wait D state for approx. 1-2 seconds,then continues with full speed. Sometimes the hangs take minutes. The workload is very heavy random read IO, only few writes

Tried strace and it shows the freezes right after mmap() calls

EDIT: solved by moving into a FreeBSD VM

Setup:

  • the rtorrent process runs inside LXC , tried running it directly, no difference
  • single HDD zpool for data, zpool mirror for OS ,using LZ4 compression
  • sda - 128GB SSD ,on top LUKS home_01
  • sdb - 3T HDD , on top LUKS devices home_02 and disk3000
  • options zfs zfs_arc_max=12G

rc.local: , the problem happens even without these settings

echo noop > /sys/block/sda/queue/scheduler
echo noop > /sys/block/sdb/queue/scheduler
echo noop > /sys/block/sdc/queue/scheduler
echo noop > /sys/block/sdd/queue/scheduler
modprobe cpufreq_powersave
echo powersave > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo powersave > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo 0 > /sys/module/zfs/parameters/l2arc_noprefetch
echo 1 > /sys/module/zfs/parameters/zfs_prefetch_disable
echo 600 > /sys/module/zfs/parameters/zfs_txg_timeout 
echo 1000000 > /sys/module/zfs/parameters/l2arc_write_max
echo 1000000 > /sys/module/zfs/parameters/l2arc_write_boost
  • Debian jessie amd64 , Pentium G4400 ,16GB RAM (not ECC)

    BOOT_IMAGE=/root@/boot/vmlinuz-4.3.0-0.bpo.1-amd64 root=ZFS=carbon-system/root ro boot=zfs quiet cgroup_enable=memory ZFS: Loaded module v0.6.5.2-2, ZFS pool version 5000, ZFS filesystem version 5

dmesg

[  599.819766] INFO: task rtorrent main:2635 blocked for more than 120 seconds.
[  599.819886]       Tainted: P           O  3.16.0-4-amd64 #1
[  599.819998] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  599.820119] rtorrent main   D ffff88040ccf8fb8     0  2635 2629 0x00000000
[  599.820125]  ffff88040ccf8b60 0000000000000082 0000000000012f00 ffff88041536ffd8
[  599.820130]  0000000000012f00 ffff88040ccf8b60 ffff880436d137b0 ffff880436fd6508
[  599.820135]  0000000000000002 ffffffff8113ce00 ffff88041536f590 ffff88041536f668
[  599.820140] Call Trace:
[  599.820147]  [<ffffffff8113ce00>] ? wait_on_page_read+0x60/0x60
[  599.820152]  [<ffffffff81510759>] ? io_schedule+0x99/0x120
[  599.820157]  [<ffffffff8113ce0a>] ? sleep_on_page+0xa/0x10
[  599.820162]  [<ffffffff81510adc>] ? __wait_on_bit+0x5c/0x90
[  599.820167]  [<ffffffff8113cbff>] ? wait_on_page_bit+0x7f/0x90
[  599.820173]  [<ffffffff810a7bd0>] ? autoremove_wake_function+0x30/0x30
[  599.820179]  [<ffffffff8114f3fe>] ? page_evictable+0xe/0x40
[  599.820185]  [<ffffffff8114fb47>] ? shrink_page_list+0x677/0xa50
[  599.820193]  [<ffffffff81150552>] ? shrink_inactive_list+0x192/0x500
[  599.820199]  [<ffffffff81151161>] ? shrink_lruvec+0x511/0x6a0
[  599.820207]  [<ffffffff810a1ec1>] ? load_balance+0x191/0x850
[  599.820213]  [<ffffffff810a1ec1>] ? load_balance+0x191/0x850
[  599.820220]  [<ffffffff81151364>] ? shrink_zone+0x74/0x1b0
[  599.820226]  [<ffffffff8115189d>] ? do_try_to_free_pages+0x12d/0x520
[  599.820233]  [<ffffffff81151e7f>] ? try_to_free_mem_cgroup_pages+0x9f/0x130
[  599.820239]  [<ffffffff8119e5ae>] ? mem_cgroup_reclaim+0x3e/0xc0
[  599.820244]  [<ffffffff8119f18f>] ? mem_cgroup_try_charge+0x1df/0x440
[  599.820281]  [<ffffffffa0277999>] ? dnode_rele_and_unlock+0x49/0x80 [zfs]
[  599.820286]  [<ffffffff8119fa50>] ? mem_cgroup_try_charge_mm+0x30/0xa0
[  599.820292]  [<ffffffff8151244e>] ? mutex_lock+0xe/0x2a
[  599.820297]  [<ffffffff811a0cff>] ? mem_cgroup_charge_file+0x6f/0xc0
[  599.820330]  [<ffffffffa02e83f6>] ? zfs_getpage+0x156/0x200 [zfs]
[  599.820337]  [<ffffffff8113daef>] ? __add_to_page_cache_locked+0x3f/0x1f0
[  599.820363]  [<ffffffffa0303010>] ? zpl_writepages+0x170/0x170 [zfs]
[  599.820369]  [<ffffffff8113dcd2>] ? add_to_page_cache_lru+0x22/0x70
[  599.820374]  [<ffffffff8114973b>] ? read_cache_pages+0x7b/0x120
[  599.820379]  [<ffffffff8114996b>] ? __do_page_cache_readahead+0x18b/0x210
[  599.820385]  [<ffffffff81471eaf>] ? tcp_write_xmit+0x17f/0xca0
[  599.820390]  [<ffffffff81149b39>] ? ondemand_readahead+0x149/0x280
[  599.820396]  [<ffffffff8113dd46>] ? pagecache_get_page_fixed+0x26/0x1b0
[  599.820402]  [<ffffffff8113f71f>] ? filemap_fault+0x2cf/0x420
[  599.820407]  [<ffffffff8116548a>] ? __do_fault+0x3a/0xa0
[  599.820412]  [<ffffffff8116804e>] ? do_read_fault.isra.54+0x4e/0x300
[  599.820418]  [<ffffffff8116987c>] ? handle_mm_fault+0x63c/0x11c0
[  599.820424]  [<ffffffff8116fcfc>] ? mmap_region+0x19c/0x650
[  599.820430]  [<ffffffff810572b7>] ? __do_page_fault+0x177/0x4f0
[  599.820436]  [<ffffffff814062a8>] ? SYSC_sendto+0x148/0x180
[  599.820444]  [<ffffffff810115dc>] ? __switch_to+0x15c/0x570
[  599.820449]  [<ffffffff81515ce8>] ? page_fault+0x28/0x30 

zpool status:

  pool: zfs-home
 state: ONLINE
  scan: scrub canceled on Sat Feb 27 15:25:35 2016
config:

    NAME        STATE     READ WRITE CKSUM
    zfs-home    ONLINE       0     0     0
      disk3000  ONLINE       0     0     0

errors: No known data errors 

tried both kernels 3.16 and 4.3 ,happens on both

root@carbon:~# cat /proc/version 
Linux version 4.3.0-0.bpo.1-amd64 (debian-kernel@lists.debian.org) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP Debian 4.3.3-7~bpo8+1 (2016-01-19)
root@carbon:~# dkms status
spl, 0.6.5, 3.16.0-4-amd64, x86_64: installed
spl, 0.6.5, 4.3.0-0.bpo.1-amd64, x86_64: installed
zfs, 0.6.5.2, 3.16.0-4-amd64, x86_64: installed
zfs, 0.6.5.2, 4.3.0-0.bpo.1-amd64, x86_64: installed

ioping during freezes:

root@carbon:/sys/module/zfs/parameters# ioping /dev/mapper/disk3000
4 KiB from /dev/mapper/disk3000 (block device 2.58 TiB): request=1 time=10.6 ms
4 KiB from /dev/mapper/disk3000 (block device 2.58 TiB): request=2 time=19.8 ms
4 KiB from /dev/mapper/disk3000 (block device 2.58 TiB): request=3 time=20.0 ms
4 KiB from /dev/mapper/disk3000 (block device 2.58 TiB): request=4 time=23.4 ms
4 KiB from /dev/mapper/disk3000 (block device 2.58 TiB): request=5 time=21.0 ms 

iopinging a file inside ZFS filesystem shows the same freezes ,also other processess are blocked , like apache ,smbd

iostat

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00   50.00    0.00   50.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00    0.00     0.00 0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sda               0.00     0.00    0.00    0.00     0.00 0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00 0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdf               0.00     0.00    0.00    0.00     0.00 0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdg               0.00     0.00    0.00    0.00     0.00 0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00 0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00 0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00    0.00     0.00 0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.50    0.00    0.00   49.75    0.00   49.75

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00    0.00     0.00 0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sda               0.00     0.00    0.00    0.00     0.00 0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00 0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdf               0.00     0.00    0.00    0.00     0.00 0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdg               0.00     0.00    0.00    0.00     0.00 0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdh               0.00     0.00    0.00    0.00     0.00 0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00 0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00    0.00     0.00 0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

iotop -ob

Total DISK READ :       0.00 B/s | Total DISK WRITE :       0.00 B/s
Actual DISK READ:       0.00 B/s | Actual DISK WRITE:       0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO COMMAND
Total DISK READ :     102.94 K/s | Total DISK WRITE :     976.02 B/s
Actual DISK READ:       0.00 B/s | Actual DISK WRITE:       0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO COMMAND
12908 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.02 % [kworker/1:1]
25787 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.02 % [kworker/0:1]
20295 be/4 www-data  102.94 K/s  976.02 B/s  0.00 %  0.00 % apache2 -k start
Total DISK READ :      55.13 K/s | Total DISK WRITE :       0.00 B/s
Actual DISK READ:       0.00 B/s | Actual DISK WRITE:       0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO COMMAND
12908 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.13 % [kworker/1:1]
25787 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.04 % [kworker/0:1]
19720 be/4 root        3.80 K/s    0.00 B/s  0.00 %  0.00 % python /usr/sbin/iotop -ob
20295 be/4 www-data   51.33 K/s    0.00 B/s  0.00 %  0.00 % apache2 -k start
Total DISK READ :      51.38 K/s | Total DISK WRITE :       0.00 B/s
Actual DISK READ:       0.00 B/s | Actual DISK WRITE:       0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO COMMAND
12908 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.02 % [kworker/1:1]
25787 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.02 % [kworker/0:1]
20295 be/4 www-data   51.38 K/s    0.00 B/s  0.00 %  0.00 % apache2 -k start

ZFS parameters:

root@carbon:~# zpool get all zfs-home
NAME      PROPERTY                    VALUE SOURCE
zfs-home  size                        2.58T -
zfs-home  capacity                    80% -
zfs-home  altroot                     - default
zfs-home  health                      ONLINE -
zfs-home  guid                        1301747802075473537 default
zfs-home  version                     - default
zfs-home  bootfs                      - default
zfs-home  delegation                  on default
zfs-home  autoreplace                 off default
zfs-home  cachefile                   - default
zfs-home  failmode                    wait default
zfs-home  listsnapshots               off default
zfs-home  autoexpand                  off default
zfs-home  dedupditto                  0 default
zfs-home  dedupratio                  1.00x -
zfs-home  free                        518G -
zfs-home  allocated                   2.07T -
zfs-home  readonly                    off -
zfs-home  ashift                      12 local
zfs-home  comment                     - default
zfs-home  expandsize                  - -
zfs-home  freeing                     0 default
zfs-home  fragmentation               46% -
zfs-home  leaked                      0 default
zfs-home  feature@async_destroy       enabled local
zfs-home  feature@empty_bpobj         active local
zfs-home  feature@lz4_compress        active local
zfs-home  feature@spacemap_histogram  active local
zfs-home  feature@enabled_txg         active local
zfs-home  feature@hole_birth          active local
zfs-home  feature@extensible_dataset  enabled local
zfs-home  feature@embedded_data       active local
zfs-home  feature@bookmarks           enabled local
zfs-home  feature@filesystem_limits   enabled local
zfs-home  feature@large_blocks        enabled local 

root@carbon:~# zfs get all zfs-home
NAME      PROPERTY              VALUE                  SOURCE
zfs-home  type                  filesystem             -
zfs-home  creation              Mon Feb 15 15:38 2016  -
zfs-home  used                  2.07T                  -
zfs-home  available             436G                   -
zfs-home  referenced            96K                    -
zfs-home  compressratio         1.00x                  -
zfs-home  mounted               yes                    -
zfs-home  quota                 none                   default
zfs-home  reservation           none                   default
zfs-home  recordsize            128K                   default
zfs-home  mountpoint            /zfs-home              default
zfs-home  sharenfs              off                    default
zfs-home  checksum              on                     default
zfs-home  compression           lz4                    local
zfs-home  atime                 off                    local
zfs-home  devices               on                     default
zfs-home  exec                  on                     default
zfs-home  setuid                on                     default
zfs-home  readonly              off                    default
zfs-home  zoned                 off                    default
zfs-home  snapdir               hidden                 default
zfs-home  aclinherit            restricted             default
zfs-home  canmount              on                     default
zfs-home  xattr                 on                     default
zfs-home  copies                1                      default
zfs-home  version               5                      -
zfs-home  utf8only              off                    -
zfs-home  normalization         none                   -
zfs-home  casesensitivity       sensitive              -
zfs-home  vscan                 off                    default
zfs-home  nbmand                off                    default
zfs-home  sharesmb              off                    default
zfs-home  refquota              none                   default
zfs-home  refreservation        none                   default
zfs-home  primarycache          all                    default
zfs-home  secondarycache        all                    default
zfs-home  usedbysnapshots       0                      -
zfs-home  usedbydataset         96K                    -
zfs-home  usedbychildren        2.07T                  -
zfs-home  usedbyrefreservation  0                      -
zfs-home  logbias               latency                default
zfs-home  dedup                 off                    default
zfs-home  mlslabel              none                   default
zfs-home  sync                  standard               default
zfs-home  refcompressratio      1.00x                  -
zfs-home  written               96K                    -
zfs-home  logicalused           2.07T                  -
zfs-home  logicalreferenced     40K                    -
zfs-home  filesystem_limit      none                   default
zfs-home  snapshot_limit        none                   default
zfs-home  filesystem_count      none                   default
zfs-home  snapshot_count        none                   default
zfs-home  snapdev               hidden                 default
zfs-home  acltype               off                    default
zfs-home  context               none                   default
zfs-home  fscontext             none                   default
zfs-home  defcontext            none                   default
zfs-home  rootcontext           none                   default
zfs-home  relatime              off                    default
zfs-home  redundant_metadata    all                    default
zfs-home  overlay               off                    default

ARCSTATS

root@carbon:/proc/spl/kstat/zfs# cat arcstats
6 1 0x01 91 4368 1373235166 543289569407
name                            type data
hits                            4    1062178
misses                          4    57178
demand_data_hits                4    1020381
demand_data_misses              4    38785
demand_metadata_hits            4    41249
demand_metadata_misses          4    17480
prefetch_data_hits              4    0
prefetch_data_misses            4    81
prefetch_metadata_hits          4    548
prefetch_metadata_misses        4    832
mru_hits                        4    919424
mru_ghost_hits                  4    0
mfu_hits                        4    142210
mfu_ghost_hits                  4    0
deleted                         4    90
mutex_miss                      4    0
evict_skip                      4    12729
evict_not_enough                4    0
evict_l2_cached                 4    0
evict_l2_eligible               4    1027072
evict_l2_ineligible             4    12288
evict_l2_skip                   4    0
hash_elements                   4    45038
hash_elements_max               4    45038
hash_collisions                 4    504
hash_chains                     4    461
hash_chain_max                  4    2
p                               4    6442450944
c                               4    12884901888
c_min                           4    1073741824
c_max                           4    12884901888
size                            4    4938511312
hdr_size                        4    19380152
data_size                       4    4786416640
metadata_size                   4    96765952
other_size                      4    35948568
anon_size                       4    82039808
anon_evictable_data             4    0
anon_evictable_metadata         4    0
mru_size                        4    3964531200
mru_evictable_data              4    3872922112
mru_evictable_metadata          4    16476160
mru_ghost_size                  4    0
mru_ghost_evictable_data        4    0
mru_ghost_evictable_metadata    4    0
mfu_size                        4    836611584
mfu_evictable_data              4    831718912
mfu_evictable_metadata          4    1880064
mfu_ghost_size                  4    0
mfu_ghost_evictable_data        4    0
mfu_ghost_evictable_metadata    4    0
l2_hits                         4    0
l2_misses                       4    0
l2_feeds                        4    0
l2_rw_clash                     4    0
l2_read_bytes                   4    0
l2_write_bytes                  4    0
l2_writes_sent                  4    0
l2_writes_done                  4    0
l2_writes_error                 4    0
l2_writes_lock_retry            4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_evict_l1cached               4    0
l2_free_on_write                4    0
l2_cdata_free_on_write          4    0
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    0
l2_asize                        4    0
l2_hdr_size                     4    0
l2_compress_successes           4    0
l2_compress_zeros               4    0
l2_compress_failures            4    0
memory_throttle_count           4    0
duplicate_buffers               4    0
duplicate_buffers_size          4    0
duplicate_reads                 4    0
memory_direct_count             4    0
memory_indirect_count           4    0
arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    0
arc_meta_used                   4    152094672
arc_meta_limit                  4    6242852352
arc_meta_max                    4    305471104
arc_meta_min                    4    16777216
arc_need_free                   4    0
arc_sys_free                    4    260116480

other users had also problems with mmap based software (MongoDB and ZFS bad performance: disk always busy with reads while doing only writes)

linux
debian
zfs
asked on Server Fault Feb 28, 2016 by Sidias-Korrado • edited Apr 13, 2017 by Community

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0