Linux RAID Re-Shape Stopped, unmountable

0

I added another drive to my RAID 5 and migrated it to a RAID 6. Everything went fine, but now the process seems to be stuck at 63.2% and md2_raid6 takes 99.9 of the CPU.

RAID Details:

/dev/md/2: Version : 1.2 Creation Time : Thu Jan 16 21:17:54 2014 Raid Level : raid6 Array Size : 19534435840 (18629.49 GiB 20003.26 GB) Used Dev Size : 3906887168 (3725.90 GiB 4000.65 GB) Raid Devices : 8 Total Devices : 8 Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Sun Jun 29 09:37:25 2014
      State : active, reshaping   Active Devices : 8 Working Devices : 8  Failed Devices : 0   Spare Devices : 0

     Layout : left-symmetric
 Chunk Size : 512K

Reshape Status : 63% complete Delta Devices : 1, (7->8)

       Name : random:2
       UUID : a2636675:62df921b:94c0ff95:64fca10e
     Events : 83461

Number   Major   Minor   RaidDevice State
   0       8       64        0      active sync   /dev/sde
   1       8       96        1      active sync   /dev/sdg
   2       8       80        2      active sync   /dev/sdf
   3       8       48        3      active sync   /dev/sdd
   4       8      128        4      active sync   /dev/sdi
   5       8      112        5      active sync   /dev/sdh
   6       8      144        6      active sync   /dev/sdj
   7       8       32        7      active sync   /dev/sdc

mdstat says:

Personalities : [raid6] [raid5] [raid4] md2 : active raid6 sde[0] sdc[7] sdj[6] sdh[5] sdi[4] sdd[3] sdf[2] sdg[1] 19534435840 blocks super 1.2 level 6, 512k chunk, algorithm 2 [8/8] [UUUUU UUU] [============>........] reshape = 63.2% (2469935104/3906887168) finish=94 2584.6min speed=25K/sec bitmap: 1/30 pages [4KB], 65536KB chunk

unused devices:

The speed is going down and it appear to be stuck at block 2469935104. I can also not mount the array any longer, as the mount command just gets stuck.

dmesg says:

> [  374.558818] md/raid:md2: not clean -- starting background reconstruction
> [  374.558821] md/raid:md2: reshape will continue
> [  374.558834] md/raid:md2: device sdh operational as raid disk 0
> [  374.558835] md/raid:md2: device sdj operational as raid disk 1
> [  374.558836] md/raid:md2: device sdi operational as raid disk 2
> [  374.558837] md/raid:md2: device sdg operational as raid disk 3
> [  374.558838] md/raid:md2: device sde operational as raid disk 4
> [  374.558839] md/raid:md2: device sdc operational as raid disk 7
> [  374.558840] md/raid:md2: device sdf operational as raid disk 6
> [  374.558840] md/raid:md2: device sdd operational as raid disk 5
> [  374.559378] md/raid:md2: allocated 0kB
> [  374.559414] md/raid:md2: raid level 6 active with 8 out of 8 devices, algorithm 2
> [  374.559416] RAID conf printout:
> [  374.559417]  --- level:6 rd:8 wd:8
> [  374.559418]  disk 0, o:1, dev:sdh
> [  374.559420]  disk 1, o:1, dev:sdj
> [  374.559421]  disk 2, o:1, dev:sdi
> [  374.559422]  disk 3, o:1, dev:sdg
> [  374.559424]  disk 4, o:1, dev:sde
> [  374.559425]  disk 5, o:1, dev:sdd
> [  374.559426]  disk 6, o:1, dev:sdf
> [  374.559428]  disk 7, o:1, dev:sdc
> [  374.559582] created bitmap (30 pages) for device md2
> [  374.560264] md2: bitmap initialized from disk: read 2 pages, set 1 of 59615 bits
> [  375.292856] md2: detected capacity change from 0 to 20003262300160
> [  375.292876] md: reshape of RAID array md2
> [  375.292879] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> [  375.292881] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
> [  375.292889] md: using 128k window, over a total of 3906887168k.
> [  375.448756]  md2: unknown partition table

After a while dmesg says:

> [  599.954327] INFO: task md2_reshape:2635 blocked for more than 120 seconds.
> [  599.954330]       Not tainted 3.13.0-30-generic #54-Ubuntu
> [  599.954331] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  599.954332] md2_reshape     D ffff88042fb94440     0  2635      2 0x00000000
> [  599.954335]  ffff880415127ae8 0000000000000002 ffff880414258000 ffff880415127fd8
> [  599.954337]  0000000000014440 0000000000014440 ffff880414258000 ffff880416d15dc8
> [  599.954339]  ffff880416d15c18 ffff880416d15c00 ffff880415127b50 ffff880416d15e18
> [  599.954341] Call Trace:
> [  599.954347]  [<ffffffff8171e749>] schedule+0x29/0x70
> [  599.954359]  [<ffffffffa0380862>] get_active_stripe+0x1d2/0x7c0 [raid456]
> [  599.954364]  [<ffffffff810aaa44>] ? __wake_up+0x44/0x50
> [  599.954367]  [<ffffffff810aaea0>] ? prepare_to_wait_event+0x100/0x100
> [  599.954370]  [<ffffffffa038126a>] reshape_request+0x24a/0x950 [raid456]
> [  599.954372]  [<ffffffff810aaea0>] ? prepare_to_wait_event+0x100/0x100
> [  599.954375]  [<ffffffffa0385c5f>] sync_request+0x23f/0x3e0 [raid456]
> [  599.954378]  [<ffffffff815a41f1>] ? is_mddev_idle+0xd1/0x140
> [  599.954380]  [<ffffffff815a74d3>] md_do_sync+0x993/0xdc0
> [  599.954382]  [<ffffffff8171e2d1>] ? __schedule+0x381/0x7d0
> [  599.954384]  [<ffffffff815a4088>] md_thread+0x118/0x130
> [  599.954386]  [<ffffffff810aaea0>] ? prepare_to_wait_event+0x100/0x100
> [  599.954388]  [<ffffffff815a3f70>] ? mddev_unlock+0xe0/0xe0
> [  599.954390]  [<ffffffff8108b322>] kthread+0xd2/0xf0
> [  599.954392]  [<ffffffff8108b250>] ? kthread_create_on_node+0x1d0/0x1d0
> [  599.954395]  [<ffffffff8172ac3c>] ret_from_fork+0x7c/0xb0
> [  599.954397]  [<ffffffff8108b250>] ? kthread_create_on_node+0x1d0/0x1d0
> [  599.954398] INFO: task blkid:2637 blocked for more than 120 seconds.
> [  599.954399]       Not tainted 3.13.0-30-generic #54-Ubuntu
> [  599.954399] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  599.954400] blkid           D ffff88042fb14440     0  2637      1 0x00000004
> [  599.954402]  ffff880414a23930 0000000000000002 ffff8800c09e2fe0 ffff880414a23fd8
> [  599.954403]  0000000000014440 0000000000014440 ffff8800c09e2fe0 ffff880416d15d98
> [  599.954405]  ffff880416d15c0c ffff880416d15c00 ffff880414a23998 ffff880416d15e18
> [  599.954406] Call Trace:
> [  599.954408]  [<ffffffff8171e749>] schedule+0x29/0x70
> [  599.954411]  [<ffffffffa0380862>] get_active_stripe+0x1d2/0x7c0 [raid456]
> [  599.954413]  [<ffffffff810aaea0>] ? prepare_to_wait_event+0x100/0x100
> [  599.954415]  [<ffffffffa0385fdc>] make_request+0x1dc/0xc00 [raid456]
> [  599.954417]  [<ffffffff810aaea0>] ? prepare_to_wait_event+0x100/0x100
> [  599.954419]  [<ffffffff815a0f05>] md_make_request+0xd5/0x220
> [  599.954431]  [<ffffffff81150ce5>] ? mempool_alloc_slab+0x15/0x20
> [  599.954435]  [<ffffffff81334d42>] generic_make_request+0xc2/0x110
> [  599.954436]  [<ffffffff81334e01>] submit_bio+0x71/0x150
> [  599.954439]  [<ffffffff811f3e96>] ? bio_alloc_bioset+0x196/0x2a0
> [  599.954442]  [<ffffffff811eef35>] _submit_bh+0x135/0x200
> [  599.954444]  [<ffffffff811f1847>] block_read_full_page+0x1e7/0x2e0
> [  599.954449]  [<ffffffff811a1985>] ? kmem_cache_alloc+0x35/0x1e0
> [  599.954450]  [<ffffffff811f50f0>] ? I_BDEV+0x10/0x10
> [  599.954453]  [<ffffffff8116c395>] ? __inc_zone_page_state+0x35/0x40
> [  599.954455]  [<ffffffff8114ebeb>] ? add_to_page_cache_locked+0xbb/0x1b0
> [  599.954457]  [<ffffffff811f58c8>] blkdev_readpage+0x18/0x20
> [  599.954458]  [<ffffffff8115a8c8>] __do_page_cache_readahead+0x1e8/0x260
> [  599.954460]  [<ffffffff8115abed>] force_page_cache_readahead+0x6d/0xa0
> [  599.954462]  [<ffffffff8115af03>] page_cache_sync_readahead+0x43/0x50
> [  599.954463]  [<ffffffff81150a85>] generic_file_aio_read+0x4c5/0x700
> [  599.954465]  [<ffffffff811f5c6b>] blkdev_aio_read+0x4b/0x70
> [  599.954468]  [<ffffffff811bc05a>] do_sync_read+0x5a/0x90
> [  599.954470]  [<ffffffff811bc6f5>] vfs_read+0x95/0x160
> [  599.954471]  [<ffffffff811bd209>] SyS_read+0x49/0xa0
> [  599.954473]  [<ffffffff8172aeff>] tracesys+0xe1/0xe6
linux
raid
software-raid
asked on Super User Jun 29, 2014 by user339656 • edited Jun 29, 2014 by cybernard

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0