I am backing up a large file system (~40TB) using cp -a -v
command. The source filesystem is BTRFS and the target is OCFS2.
The backup has been running for 3 days now and it slowed down considerably to ~200kB/s IO (measured with iotop). The "cp" process is using ~2% CPU (measured with top). At the same time disk cache usage peaked to almost all available memory (512GB). I also got a couple of info-level warnings from the kernel:
Mar 25 16:34:10 mamut kernel: INFO: task jbd2/dm-7-30:6098 blocked for more than 120 seconds.
Mar 25 16:34:10 mamut kernel: Not tainted 4.19.0-2-amd64 #1 Debian 4.19.16-1
Mar 25 16:34:10 mamut kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 25 16:34:10 mamut kernel: jbd2/dm-7-30 D 0 6098 2 0x80000000
Mar 25 16:34:10 mamut kernel: Call Trace:
Mar 25 16:34:10 mamut kernel: ? __schedule+0x2a2/0x870
Mar 25 16:34:10 mamut kernel: ? __wake_up_common_lock+0x89/0xc0
Mar 25 16:34:10 mamut kernel: ? finish_wait+0x80/0x80
Mar 25 16:34:10 mamut kernel: schedule+0x28/0x80
Mar 25 16:34:10 mamut kernel: jbd2_journal_commit_transaction+0x24b/0x17a0 [jbd2]
Mar 25 16:34:10 mamut kernel: ? __switch_to_asm+0x40/0x70
Mar 25 16:34:10 mamut kernel: ? __switch_to_asm+0x34/0x70
Mar 25 16:34:10 mamut kernel: ? __switch_to_asm+0x34/0x70
Mar 25 16:34:10 mamut kernel: ? __switch_to_asm+0x40/0x70
Mar 25 16:34:10 mamut kernel: ? __switch_to_asm+0x34/0x70
Mar 25 16:34:10 mamut kernel: ? __switch_to_asm+0x40/0x70
Mar 25 16:34:10 mamut kernel: ? __switch_to_asm+0x34/0x70
Mar 25 16:34:10 mamut kernel: ? __switch_to_asm+0x34/0x70
Mar 25 16:34:10 mamut kernel: ? __switch_to_asm+0x40/0x70
Mar 25 16:34:10 mamut kernel: ? __switch_to_asm+0x40/0x70
Mar 25 16:34:10 mamut kernel: ? __switch_to_asm+0x34/0x70
Mar 25 16:34:10 mamut kernel: ? __switch_to_asm+0x34/0x70
Mar 25 16:34:10 mamut kernel: ? finish_wait+0x80/0x80
Mar 25 16:34:10 mamut kernel: ? lock_timer_base+0x67/0x80
Mar 25 16:34:10 mamut kernel: kjournald2+0xbd/0x270 [jbd2]
Mar 25 16:34:10 mamut kernel: ? finish_wait+0x80/0x80
Mar 25 16:34:10 mamut kernel: ? commit_timeout+0x10/0x10 [jbd2]
Mar 25 16:34:10 mamut kernel: kthread+0x112/0x130
Mar 25 16:34:10 mamut kernel: ? kthread_bind+0x30/0x30
Mar 25 16:34:10 mamut kernel: ret_from_fork+0x22/0x40
Mar 25 16:34:10 mamut kernel: INFO: task ocfs2cmt-FB094E:6099 blocked for more than 120 seconds.
Mar 25 16:34:10 mamut kernel: Not tainted 4.19.0-2-amd64 #1 Debian 4.19.16-1
Mar 25 16:34:10 mamut kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 25 16:34:10 mamut kernel: ocfs2cmt-FB094E D 0 6099 2 0x80000000
Mar 25 16:34:10 mamut kernel: Call Trace:
Mar 25 16:34:10 mamut kernel: ? __schedule+0x2a2/0x870
Mar 25 16:34:10 mamut kernel: ? __switch_to_asm+0x40/0x70
Mar 25 16:34:10 mamut kernel: ? __switch_to_asm+0x34/0x70
Mar 25 16:34:10 mamut kernel: schedule+0x28/0x80
Mar 25 16:34:10 mamut kernel: rwsem_down_write_failed+0x183/0x3a0
Mar 25 16:34:10 mamut kernel: ? __switch_to_asm+0x40/0x70
Mar 25 16:34:11 mamut kernel: call_rwsem_down_write_failed+0x13/0x20
Mar 25 16:34:11 mamut kernel: down_write+0x29/0x40
Mar 25 16:34:11 mamut kernel: ocfs2_commit_thread+0x73/0x340 [ocfs2]
Mar 25 16:34:11 mamut kernel: ? finish_wait+0x80/0x80
Mar 25 16:34:11 mamut kernel: ? __ocfs2_journal_access+0x330/0x330 [ocfs2]
Mar 25 16:34:11 mamut kernel: kthread+0x112/0x130
Mar 25 16:34:11 mamut kernel: ? kthread_bind+0x30/0x30
Mar 25 16:34:11 mamut kernel: ret_from_fork+0x22/0x40
Mar 25 16:34:11 mamut kernel: INFO: task kworker/0:1:19880 blocked for more than 120 seconds.
Mar 25 16:34:11 mamut kernel: Not tainted 4.19.0-2-amd64 #1 Debian 4.19.16-1
Mar 25 16:34:11 mamut kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 25 16:34:11 mamut kernel: kworker/0:1 D 0 19880 2 0x80000000
Mar 25 16:34:11 mamut kernel: Workqueue: events qsync_work_fn [ocfs2]
Mar 25 16:34:11 mamut kernel: Call Trace:
Mar 25 16:34:11 mamut kernel: ? __schedule+0x2a2/0x870
Mar 25 16:34:11 mamut kernel: schedule+0x28/0x80
Mar 25 16:34:11 mamut kernel: schedule_timeout+0x26d/0x390
Mar 25 16:34:11 mamut kernel: ? find_busiest_group+0x78f/0xb10
Mar 25 16:34:11 mamut kernel: wait_for_completion+0x11f/0x190
Mar 25 16:34:11 mamut kernel: ? wake_up_q+0x70/0x70
Mar 25 16:34:11 mamut kernel: __ocfs2_cluster_lock.isra.41+0x4eb/0x7e0 [ocfs2]
Mar 25 16:34:11 mamut kernel: ? ocfs2_inode_lock_full_nested+0x110/0x9c0 [ocfs2]
Mar 25 16:34:11 mamut kernel: ocfs2_inode_lock_full_nested+0x110/0x9c0 [ocfs2]
Mar 25 16:34:11 mamut kernel: ? __switch_to_asm+0x40/0x70
Mar 25 16:34:11 mamut kernel: ? __switch_to_asm+0x34/0x70
Mar 25 16:34:11 mamut kernel: ? __switch_to_asm+0x40/0x70
Mar 25 16:34:11 mamut kernel: ocfs2_lock_global_qf+0x3b/0xe0 [ocfs2]
Mar 25 16:34:11 mamut kernel: ocfs2_sync_dquot_helper+0x110/0x290 [ocfs2]
Mar 25 16:34:11 mamut kernel: dquot_scan_active+0xe9/0x150
Mar 25 16:34:11 mamut kernel: ? __ocfs2_sync_dquot+0x510/0x510 [ocfs2]
Mar 25 16:34:11 mamut kernel: qsync_work_fn+0x63/0x70 [ocfs2]
Mar 25 16:34:11 mamut kernel: process_one_work+0x1a7/0x3a0
Mar 25 16:34:11 mamut kernel: worker_thread+0x30/0x390
Mar 25 16:34:11 mamut kernel: ? pwq_unbound_release_workfn+0xd0/0xd0
Mar 25 16:34:11 mamut kernel: kthread+0x112/0x130
Mar 25 16:34:11 mamut kernel: ? kthread_bind+0x30/0x30
Mar 25 16:34:11 mamut kernel: ret_from_fork+0x22/0x40
Mounted file systems, note that the source filesystem is read-only:
/dev/mapper/fc_orca-part1 on /mnt/orca type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,coherency=full,user_xattr,acl,_netdev)
/dev/mapper/fc_trunk-part3 on /mnt/storage type btrfs (ro,relatime,compress=lzo,space_cache,subvolid=5,subvol=/,_netdev)
Memory usage:
# cat /proc/meminfo
MemTotal: 528350404 kB
MemFree: 4246168 kB
MemAvailable: 495941052 kB
Buffers: 3534068 kB
Cached: 484764104 kB
SwapCached: 0 kB
Active: 55833288 kB
Inactive: 459087632 kB
Active(anon): 26635204 kB
Inactive(anon): 57268 kB
Active(file): 29198084 kB
Inactive(file): 459030364 kB
Unevictable: 38896 kB
Mlocked: 38896 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 364 kB
Writeback: 0 kB
AnonPages: 26661756 kB
Mapped: 295396 kB
Shmem: 59944 kB
Slab: 8425756 kB
SReclaimable: 6515572 kB
SUnreclaim: 1910184 kB
KernelStack: 23024 kB
PageTables: 71604 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 264175200 kB
Committed_AS: 33497116 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
Percpu: 93696 kB
HardwareCorrupted: 0 kB
AnonHugePages: 25264128 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 4015888 kB
DirectMap2M: 376584192 kB
DirectMap1G: 156237824 kB
Buffering settings (unchanged from default):
sysctl -a | grep dirty.*ratio
vm.dirty_background_ratio = 10
vm.dirty_ratio = 20
1) Speed of copying is unacceptable.
2) ls
or du
of directory the current file is being copied to takes a very long time ~10min to finish.
3) disk reads are ~200 kB/s but writes are not. Most of the time write is 0 B/s. Maybe it flushes whole files to ocfs2?
4) I am not copying lots of tiny files. The file currently being copied is 3GB in size.
5) Some months ago I benchmarked both filesystems and both read and write were > 100MB/s.
6) OCFS2 is a shared-disk filesystem. I unmounted it on all other nodes, but it did not speed the copying up.
7) Neither target nor source filesystem is close to full.
What could be the reason?
User contributions licensed under CC BY-SA 3.0