Semi-failed RAID6, still able to copy data but incredibly slow

Question

Semi-failed RAID6, still able to copy data but incredibly slow

I had a 5-drive software raid6 with mdadm setup (2 parity drives), and a drive failed. I ordered a replacement, and when I powered off the machine to swap the failed drive with a new one, ANOTHER drive failed at the same time (completely dead). So now there is 3 of the old drives with data, 1 new one that is rebuilding, and 1 missing drive.

I then noticed that the rebuild was going incredibly slowly, data was only coping around 100 kb/s. Prior rebuilds would run around 100 MB/s! I decided to purchase a Synology appliance with new drives and to copy as much data off while I still could. It's been running for 2 months and I've been able to copy a few TB off but there are still several TB to go and at this rate it will be another 6 months before it finishes.

The data coming onto the new NAS (Synology) is fine, there is no data loss thus far! I was hoping there was something I could do to try to make it go faster. The error logs indicate it is failing on a specific drive (sdd) but perhaps there is a setting that will tell it to "fail faster" so that it copies faster since it is not actually failing? Logs are below:

cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdf1[5] sdb1[0] sdc1[1] sdd1[2]
      17581168128 blocks super 1.2 level 6, 512k chunk, algorithm 2 [5/3] [UUU__]
      [>....................]  recovery =  0.4% (24696932/5860389376) finish=584364702.6min speed=0K/sec

unused devices: <none>

tail of /var/log/messages

Dec 16 11:29:47 [localhost] kernel: ata4.00: status: { DRDY ERR }
Dec 16 11:29:47 [localhost] kernel: ata4.00: error: { UNC }
Dec 16 11:29:47 [localhost] kernel: ata4: hard resetting link
Dec 16 11:29:47 [localhost] kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 16 11:29:47 [localhost] kernel: ata4.00: configured for UDMA/133
Dec 16 11:29:47 [localhost] kernel: sd 3:0:0:0: [sdd] tag#24 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec 16 11:29:47 [localhost] kernel: sd 3:0:0:0: [sdd] tag#24 Sense Key : Medium Error [current] [descriptor]
Dec 16 11:29:47 [localhost] kernel: sd 3:0:0:0: [sdd] tag#24 Add. Sense: Unrecovered read error - auto reallocate failed
Dec 16 11:29:47 [localhost] kernel: sd 3:0:0:0: [sdd] tag#24 CDB: Read(16) 88 00 00 00 00 00 02 87 64 50 00 00 00 40 00 00
Dec 16 11:29:47 [localhost] kernel: blk_update_request: I/O error, dev sdd, sector 42427472
Dec 16 11:29:47 [localhost] kernel: raid5_end_read_request: 5 callbacks suppressed
Dec 16 11:29:47 [localhost] kernel: md/raid:md0: read error not correctable (sector 42425424 on sdd1).
Dec 16 11:29:47 [localhost] kernel: md/raid:md0: read error not correctable (sector 42425432 on sdd1).
Dec 16 11:29:47 [localhost] kernel: md/raid:md0: read error not correctable (sector 42425440 on sdd1).
Dec 16 11:29:47 [localhost] kernel: md/raid:md0: read error not correctable (sector 42425448 on sdd1).
Dec 16 11:29:47 [localhost] kernel: md/raid:md0: read error not correctable (sector 42425456 on sdd1).
Dec 16 11:29:47 [localhost] kernel: md/raid:md0: read error not correctable (sector 42425464 on sdd1).
Dec 16 11:29:47 [localhost] kernel: md/raid:md0: read error not correctable (sector 42425472 on sdd1).
Dec 16 11:29:47 [localhost] kernel: md/raid:md0: read error not correctable (sector 42425480 on sdd1).
Dec 16 11:29:47 [localhost] kernel: ata4: EH complete
Dec 16 11:29:51 [localhost] kernel: ata4.00: exception Emask 0x0 SAct 0x10000000 SErr 0x0 action 0x0
Dec 16 11:29:51 [localhost] kernel: ata4.00: irq_stat 0x40000008
Dec 16 11:29:51 [localhost] kernel: ata4.00: failed command: READ FPDMA QUEUED
Dec 16 11:29:51 [localhost] kernel: ata4.00: cmd 60/38:e0:30:b8:f5/00:00:02:00:00/40 tag 28 ncq 28672 in#012         res 41/40:00:30:b8:f5/00:00:02:00:00/00 Emask 0x409 (media error) <F>

mdadm

raid6

asked on Super User Dec 16, 2019 by

Russ

1 Answer

So there are several posts on superuser that are similar to this and all are unanswered. Its because you should use ddrescue instead to fix the volume first, then rsync will be fine.

https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID

answered on Super User Dec 21, 2019 by

Russ

User contributions licensed under CC BY-SA 3.0