I have a problem that occurs regularly,cca every 5 seconds. the rtorrent process gets stuck on disk wait D state for approx. 1-2 seconds,then continues with full speed. Sometimes the hangs take minutes. The workload is very heavy random read IO, only few writes
Tried strace and it shows the freezes right after mmap() calls
EDIT: solved by moving into a FreeBSD VM
Setup:
rc.local: , the problem happens even without these settings
echo noop > /sys/block/sda/queue/scheduler
echo noop > /sys/block/sdb/queue/scheduler
echo noop > /sys/block/sdc/queue/scheduler
echo noop > /sys/block/sdd/queue/scheduler
modprobe cpufreq_powersave
echo powersave > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo powersave > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo 0 > /sys/module/zfs/parameters/l2arc_noprefetch
echo 1 > /sys/module/zfs/parameters/zfs_prefetch_disable
echo 600 > /sys/module/zfs/parameters/zfs_txg_timeout
echo 1000000 > /sys/module/zfs/parameters/l2arc_write_max
echo 1000000 > /sys/module/zfs/parameters/l2arc_write_boost
Debian jessie amd64 , Pentium G4400 ,16GB RAM (not ECC)
BOOT_IMAGE=/root@/boot/vmlinuz-4.3.0-0.bpo.1-amd64 root=ZFS=carbon-system/root ro boot=zfs quiet cgroup_enable=memory ZFS: Loaded module v0.6.5.2-2, ZFS pool version 5000, ZFS filesystem version 5
dmesg
[ 599.819766] INFO: task rtorrent main:2635 blocked for more than 120 seconds.
[ 599.819886] Tainted: P O 3.16.0-4-amd64 #1
[ 599.819998] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 599.820119] rtorrent main D ffff88040ccf8fb8 0 2635 2629 0x00000000
[ 599.820125] ffff88040ccf8b60 0000000000000082 0000000000012f00 ffff88041536ffd8
[ 599.820130] 0000000000012f00 ffff88040ccf8b60 ffff880436d137b0 ffff880436fd6508
[ 599.820135] 0000000000000002 ffffffff8113ce00 ffff88041536f590 ffff88041536f668
[ 599.820140] Call Trace:
[ 599.820147] [<ffffffff8113ce00>] ? wait_on_page_read+0x60/0x60
[ 599.820152] [<ffffffff81510759>] ? io_schedule+0x99/0x120
[ 599.820157] [<ffffffff8113ce0a>] ? sleep_on_page+0xa/0x10
[ 599.820162] [<ffffffff81510adc>] ? __wait_on_bit+0x5c/0x90
[ 599.820167] [<ffffffff8113cbff>] ? wait_on_page_bit+0x7f/0x90
[ 599.820173] [<ffffffff810a7bd0>] ? autoremove_wake_function+0x30/0x30
[ 599.820179] [<ffffffff8114f3fe>] ? page_evictable+0xe/0x40
[ 599.820185] [<ffffffff8114fb47>] ? shrink_page_list+0x677/0xa50
[ 599.820193] [<ffffffff81150552>] ? shrink_inactive_list+0x192/0x500
[ 599.820199] [<ffffffff81151161>] ? shrink_lruvec+0x511/0x6a0
[ 599.820207] [<ffffffff810a1ec1>] ? load_balance+0x191/0x850
[ 599.820213] [<ffffffff810a1ec1>] ? load_balance+0x191/0x850
[ 599.820220] [<ffffffff81151364>] ? shrink_zone+0x74/0x1b0
[ 599.820226] [<ffffffff8115189d>] ? do_try_to_free_pages+0x12d/0x520
[ 599.820233] [<ffffffff81151e7f>] ? try_to_free_mem_cgroup_pages+0x9f/0x130
[ 599.820239] [<ffffffff8119e5ae>] ? mem_cgroup_reclaim+0x3e/0xc0
[ 599.820244] [<ffffffff8119f18f>] ? mem_cgroup_try_charge+0x1df/0x440
[ 599.820281] [<ffffffffa0277999>] ? dnode_rele_and_unlock+0x49/0x80 [zfs]
[ 599.820286] [<ffffffff8119fa50>] ? mem_cgroup_try_charge_mm+0x30/0xa0
[ 599.820292] [<ffffffff8151244e>] ? mutex_lock+0xe/0x2a
[ 599.820297] [<ffffffff811a0cff>] ? mem_cgroup_charge_file+0x6f/0xc0
[ 599.820330] [<ffffffffa02e83f6>] ? zfs_getpage+0x156/0x200 [zfs]
[ 599.820337] [<ffffffff8113daef>] ? __add_to_page_cache_locked+0x3f/0x1f0
[ 599.820363] [<ffffffffa0303010>] ? zpl_writepages+0x170/0x170 [zfs]
[ 599.820369] [<ffffffff8113dcd2>] ? add_to_page_cache_lru+0x22/0x70
[ 599.820374] [<ffffffff8114973b>] ? read_cache_pages+0x7b/0x120
[ 599.820379] [<ffffffff8114996b>] ? __do_page_cache_readahead+0x18b/0x210
[ 599.820385] [<ffffffff81471eaf>] ? tcp_write_xmit+0x17f/0xca0
[ 599.820390] [<ffffffff81149b39>] ? ondemand_readahead+0x149/0x280
[ 599.820396] [<ffffffff8113dd46>] ? pagecache_get_page_fixed+0x26/0x1b0
[ 599.820402] [<ffffffff8113f71f>] ? filemap_fault+0x2cf/0x420
[ 599.820407] [<ffffffff8116548a>] ? __do_fault+0x3a/0xa0
[ 599.820412] [<ffffffff8116804e>] ? do_read_fault.isra.54+0x4e/0x300
[ 599.820418] [<ffffffff8116987c>] ? handle_mm_fault+0x63c/0x11c0
[ 599.820424] [<ffffffff8116fcfc>] ? mmap_region+0x19c/0x650
[ 599.820430] [<ffffffff810572b7>] ? __do_page_fault+0x177/0x4f0
[ 599.820436] [<ffffffff814062a8>] ? SYSC_sendto+0x148/0x180
[ 599.820444] [<ffffffff810115dc>] ? __switch_to+0x15c/0x570
[ 599.820449] [<ffffffff81515ce8>] ? page_fault+0x28/0x30
zpool status:
pool: zfs-home
state: ONLINE
scan: scrub canceled on Sat Feb 27 15:25:35 2016
config:
NAME STATE READ WRITE CKSUM
zfs-home ONLINE 0 0 0
disk3000 ONLINE 0 0 0
errors: No known data errors
tried both kernels 3.16 and 4.3 ,happens on both
root@carbon:~# cat /proc/version
Linux version 4.3.0-0.bpo.1-amd64 (debian-kernel@lists.debian.org) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP Debian 4.3.3-7~bpo8+1 (2016-01-19)
root@carbon:~# dkms status
spl, 0.6.5, 3.16.0-4-amd64, x86_64: installed
spl, 0.6.5, 4.3.0-0.bpo.1-amd64, x86_64: installed
zfs, 0.6.5.2, 3.16.0-4-amd64, x86_64: installed
zfs, 0.6.5.2, 4.3.0-0.bpo.1-amd64, x86_64: installed
ioping during freezes:
root@carbon:/sys/module/zfs/parameters# ioping /dev/mapper/disk3000
4 KiB from /dev/mapper/disk3000 (block device 2.58 TiB): request=1 time=10.6 ms
4 KiB from /dev/mapper/disk3000 (block device 2.58 TiB): request=2 time=19.8 ms
4 KiB from /dev/mapper/disk3000 (block device 2.58 TiB): request=3 time=20.0 ms
4 KiB from /dev/mapper/disk3000 (block device 2.58 TiB): request=4 time=23.4 ms
4 KiB from /dev/mapper/disk3000 (block device 2.58 TiB): request=5 time=21.0 ms
iostat
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.00 50.00 0.00 50.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 0.00 49.75 0.00 49.75
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
iotop -ob
Total DISK READ : 0.00 B/s | Total DISK WRITE : 0.00 B/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
Total DISK READ : 102.94 K/s | Total DISK WRITE : 976.02 B/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
12908 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.02 % [kworker/1:1]
25787 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.02 % [kworker/0:1]
20295 be/4 www-data 102.94 K/s 976.02 B/s 0.00 % 0.00 % apache2 -k start
Total DISK READ : 55.13 K/s | Total DISK WRITE : 0.00 B/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
12908 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.13 % [kworker/1:1]
25787 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.04 % [kworker/0:1]
19720 be/4 root 3.80 K/s 0.00 B/s 0.00 % 0.00 % python /usr/sbin/iotop -ob
20295 be/4 www-data 51.33 K/s 0.00 B/s 0.00 % 0.00 % apache2 -k start
Total DISK READ : 51.38 K/s | Total DISK WRITE : 0.00 B/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
12908 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.02 % [kworker/1:1]
25787 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.02 % [kworker/0:1]
20295 be/4 www-data 51.38 K/s 0.00 B/s 0.00 % 0.00 % apache2 -k start
ZFS parameters:
root@carbon:~# zpool get all zfs-home
NAME PROPERTY VALUE SOURCE
zfs-home size 2.58T -
zfs-home capacity 80% -
zfs-home altroot - default
zfs-home health ONLINE -
zfs-home guid 1301747802075473537 default
zfs-home version - default
zfs-home bootfs - default
zfs-home delegation on default
zfs-home autoreplace off default
zfs-home cachefile - default
zfs-home failmode wait default
zfs-home listsnapshots off default
zfs-home autoexpand off default
zfs-home dedupditto 0 default
zfs-home dedupratio 1.00x -
zfs-home free 518G -
zfs-home allocated 2.07T -
zfs-home readonly off -
zfs-home ashift 12 local
zfs-home comment - default
zfs-home expandsize - -
zfs-home freeing 0 default
zfs-home fragmentation 46% -
zfs-home leaked 0 default
zfs-home feature@async_destroy enabled local
zfs-home feature@empty_bpobj active local
zfs-home feature@lz4_compress active local
zfs-home feature@spacemap_histogram active local
zfs-home feature@enabled_txg active local
zfs-home feature@hole_birth active local
zfs-home feature@extensible_dataset enabled local
zfs-home feature@embedded_data active local
zfs-home feature@bookmarks enabled local
zfs-home feature@filesystem_limits enabled local
zfs-home feature@large_blocks enabled local
root@carbon:~# zfs get all zfs-home
NAME PROPERTY VALUE SOURCE
zfs-home type filesystem -
zfs-home creation Mon Feb 15 15:38 2016 -
zfs-home used 2.07T -
zfs-home available 436G -
zfs-home referenced 96K -
zfs-home compressratio 1.00x -
zfs-home mounted yes -
zfs-home quota none default
zfs-home reservation none default
zfs-home recordsize 128K default
zfs-home mountpoint /zfs-home default
zfs-home sharenfs off default
zfs-home checksum on default
zfs-home compression lz4 local
zfs-home atime off local
zfs-home devices on default
zfs-home exec on default
zfs-home setuid on default
zfs-home readonly off default
zfs-home zoned off default
zfs-home snapdir hidden default
zfs-home aclinherit restricted default
zfs-home canmount on default
zfs-home xattr on default
zfs-home copies 1 default
zfs-home version 5 -
zfs-home utf8only off -
zfs-home normalization none -
zfs-home casesensitivity sensitive -
zfs-home vscan off default
zfs-home nbmand off default
zfs-home sharesmb off default
zfs-home refquota none default
zfs-home refreservation none default
zfs-home primarycache all default
zfs-home secondarycache all default
zfs-home usedbysnapshots 0 -
zfs-home usedbydataset 96K -
zfs-home usedbychildren 2.07T -
zfs-home usedbyrefreservation 0 -
zfs-home logbias latency default
zfs-home dedup off default
zfs-home mlslabel none default
zfs-home sync standard default
zfs-home refcompressratio 1.00x -
zfs-home written 96K -
zfs-home logicalused 2.07T -
zfs-home logicalreferenced 40K -
zfs-home filesystem_limit none default
zfs-home snapshot_limit none default
zfs-home filesystem_count none default
zfs-home snapshot_count none default
zfs-home snapdev hidden default
zfs-home acltype off default
zfs-home context none default
zfs-home fscontext none default
zfs-home defcontext none default
zfs-home rootcontext none default
zfs-home relatime off default
zfs-home redundant_metadata all default
zfs-home overlay off default
ARCSTATS
root@carbon:/proc/spl/kstat/zfs# cat arcstats
6 1 0x01 91 4368 1373235166 543289569407
name type data
hits 4 1062178
misses 4 57178
demand_data_hits 4 1020381
demand_data_misses 4 38785
demand_metadata_hits 4 41249
demand_metadata_misses 4 17480
prefetch_data_hits 4 0
prefetch_data_misses 4 81
prefetch_metadata_hits 4 548
prefetch_metadata_misses 4 832
mru_hits 4 919424
mru_ghost_hits 4 0
mfu_hits 4 142210
mfu_ghost_hits 4 0
deleted 4 90
mutex_miss 4 0
evict_skip 4 12729
evict_not_enough 4 0
evict_l2_cached 4 0
evict_l2_eligible 4 1027072
evict_l2_ineligible 4 12288
evict_l2_skip 4 0
hash_elements 4 45038
hash_elements_max 4 45038
hash_collisions 4 504
hash_chains 4 461
hash_chain_max 4 2
p 4 6442450944
c 4 12884901888
c_min 4 1073741824
c_max 4 12884901888
size 4 4938511312
hdr_size 4 19380152
data_size 4 4786416640
metadata_size 4 96765952
other_size 4 35948568
anon_size 4 82039808
anon_evictable_data 4 0
anon_evictable_metadata 4 0
mru_size 4 3964531200
mru_evictable_data 4 3872922112
mru_evictable_metadata 4 16476160
mru_ghost_size 4 0
mru_ghost_evictable_data 4 0
mru_ghost_evictable_metadata 4 0
mfu_size 4 836611584
mfu_evictable_data 4 831718912
mfu_evictable_metadata 4 1880064
mfu_ghost_size 4 0
mfu_ghost_evictable_data 4 0
mfu_ghost_evictable_metadata 4 0
l2_hits 4 0
l2_misses 4 0
l2_feeds 4 0
l2_rw_clash 4 0
l2_read_bytes 4 0
l2_write_bytes 4 0
l2_writes_sent 4 0
l2_writes_done 4 0
l2_writes_error 4 0
l2_writes_lock_retry 4 0
l2_evict_lock_retry 4 0
l2_evict_reading 4 0
l2_evict_l1cached 4 0
l2_free_on_write 4 0
l2_cdata_free_on_write 4 0
l2_abort_lowmem 4 0
l2_cksum_bad 4 0
l2_io_error 4 0
l2_size 4 0
l2_asize 4 0
l2_hdr_size 4 0
l2_compress_successes 4 0
l2_compress_zeros 4 0
l2_compress_failures 4 0
memory_throttle_count 4 0
duplicate_buffers 4 0
duplicate_buffers_size 4 0
duplicate_reads 4 0
memory_direct_count 4 0
memory_indirect_count 4 0
arc_no_grow 4 0
arc_tempreserve 4 0
arc_loaned_bytes 4 0
arc_prune 4 0
arc_meta_used 4 152094672
arc_meta_limit 4 6242852352
arc_meta_max 4 305471104
arc_meta_min 4 16777216
arc_need_free 4 0
arc_sys_free 4 260116480
other users had also problems with mmap based software (MongoDB and ZFS bad performance: disk always busy with reads while doing only writes)
User contributions licensed under CC BY-SA 3.0