Mysql crashes. Bad hard drive or hardware?

-1

I've seen high loads and mysql crash 2 times in 1 week now. Could this be the cause? Any idea?

    Jan  3 09:49:19 HOST kernel: [2272100.568769]          res 51/40:38:78:7f:f1/40:00:35:00:00/e0 Emask 0x9 (media error)
    Jan  3 09:49:19 HOST kernel: [2272100.569023] ata2.00: status: { DRDY ERR }
    Jan  3 09:49:19 HOST kernel: [2272100.569089] ata2.00: error: { UNC }
    Jan  3 09:49:19 HOST kernel: [2272100.577394] ata2.00: configured for UDMA/133
    Jan  3 09:49:19 HOST kernel: [2272100.577418] ata2: EH complete
    Jan  3 09:49:26 HOST kernel: [2272107.699341] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    Jan  3 09:49:26 HOST kernel: [2272107.699569] ata2.00: BMDMA stat 0x25
    Jan  3 09:49:26 HOST kernel: [2272107.699643] ata2.00: failed command: READ DMA EXT
    Jan  3 09:49:26 HOST kernel: [2272107.699713] ata2.00: cmd 25/00:38:78:7f:f1/00:00:35:00:00/e0 tag 0 dma 28672 in
    Jan  3 09:49:26 HOST kernel: [2272107.699715]          res 51/40:38:78:7f:f1/40:00:35:00:00/e0 Emask 0x9 (media error)
    Jan  3 09:49:26 HOST kernel: [2272107.699966] ata2.00: status: { DRDY ERR }
    Jan  3 09:49:26 HOST kernel: [2272107.700030] ata2.00: error: { UNC }
    Jan  3 09:49:26 HOST kernel: [2272107.708509] ata2.00: configured for UDMA/133
    Jan  3 09:49:26 HOST kernel: [2272107.708534] ata2: EH complete
    Jan  3 09:49:33 HOST kernel: [2272114.833522] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    Jan  3 09:49:33 HOST kernel: [2272114.833603] ata2.00: BMDMA stat 0x25
    Jan  3 09:49:33 HOST kernel: [2272114.833669] ata2.00: failed command: READ DMA EXT
    Jan  3 09:49:33 HOST kernel: [2272114.833737] ata2.00: cmd 25/00:38:78:7f:f1/00:00:35:00:00/e0 tag 0 dma 28672 in
    Jan  3 09:49:33 HOST kernel: [2272114.833739]          res 51/40:38:78:7f:f1/40:00:35:00:00/e0 Emask 0x9 (media error)
    Jan  3 09:49:33 HOST kernel: [2272114.833992] ata2.00: status: { DRDY ERR }
    Jan  3 09:49:33 HOST kernel: [2272114.834056] ata2.00: error: { UNC }
    Jan  3 09:49:33 HOST kernel: [2272114.842578] ata2.00: configured for UDMA/133
    Jan  3 09:49:33 HOST kernel: [2272114.842604] ata2: EH complete
    Jan  3 09:49:40 HOST kernel: [2272121.959563] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
    Jan  3 09:49:40 HOST kernel: [2272121.959644] ata2.00: BMDMA stat 0x25
    Jan  3 09:49:40 HOST kernel: [2272121.959708] ata2.00: failed command: READ DMA EXT
    Jan  3 09:49:40 HOST kernel: [2272121.959778] ata2.00: cmd 25/00:38:78:7f:f1/00:00:35:00:00/e0 tag 0 dma 28672 in
    Jan  3 09:49:40 HOST kernel: [2272121.959780]          res 51/40:38:78:7f:f1/40:00:35:00:00/e0 Emask 0x9 (media error)
    Jan  3 09:49:40 HOST kernel: [2272121.961337] ata2.00: status: { DRDY ERR }
    Jan  3 09:49:40 HOST kernel: [2272121.961400] ata2.00: error: { UNC }
    Jan  3 09:49:40 HOST kernel: [2272121.968673] ata2.00: configured for UDMA/133
    Jan  3 09:49:40 HOST kernel: [2272121.968701] sd 1:0:0:0: [sda] Unhandled sense code
    Jan  3 09:49:40 HOST kernel: [2272121.968706] sd 1:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    Jan  3 09:49:40 HOST kernel: [2272121.968714] sd 1:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor]
    Jan  3 09:49:40 HOST kernel: [2272121.968723] Descriptor sense data with sense descriptors (in hex):
    Jan  3 09:49:40 HOST kernel: [2272121.968729]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
    Jan  3 09:49:40 HOST kernel: [2272121.968743]         35 f1 7f 78
    Jan  3 09:49:40 HOST kernel: [2272121.968749] sd 1:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
    Jan  3 09:49:40 HOST kernel: [2272121.968759] sd 1:0:0:0: [sda] CDB: Read(10): 28 00 35 f1 7f 78 00 00 38 00
    Jan  3 09:49:40 HOST kernel: [2272121.968778] ata2: EH complete
Jan  3 09:47:45 HOST kernel: [2272007.394223]  [<ffffffffa00c9638>] __ext4_journal_get_write_access+0x38/0x80 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394232]  [<ffffffffa00a0563>] ext4_reserve_inode_write+0x73/0xa0 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394241]  [<ffffffffa00a05dc>] ext4_mark_inode_dirty+0x4c/0x1d0 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394253]  [<ffffffffa00d593c>] ? ext4_xattr_get+0x10c/0x2c0 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394262]  [<ffffffffa00a08d0>] ext4_dirty_inode+0x40/0x60 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394266]  [<ffffffff811c7b2b>] __mark_inode_dirty+0x3b/0x160
Jan  3 09:47:45 HOST kernel: [2272007.394270]  [<ffffffff811b792a>] file_update_time+0x10a/0x1a0
Jan  3 09:47:45 HOST kernel: [2272007.394274]  [<ffffffff8112ac6c>] __generic_file_write_iter+0x1fc/0x420
Jan  3 09:47:45 HOST kernel: [2272007.394278]  [<ffffffff81127571>] ? file_read_iter_actor+0x61/0x80
Jan  3 09:47:45 HOST kernel: [2272007.394282]  [<ffffffff8112af15>] __generic_file_aio_write+0x85/0xa0
Jan  3 09:47:45 HOST kernel: [2272007.394287]  [<ffffffff8112af9f>] generic_file_aio_write+0x6f/0xe0
Jan  3 09:47:45 HOST kernel: [2272007.394295]  [<ffffffffa009a331>] ext4_file_write+0x61/0x1e0 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394299]  [<ffffffff8119c78a>] do_sync_write+0xfa/0x140
Jan  3 09:47:45 HOST kernel: [2272007.394303]  [<ffffffff81097d70>] ? autoremove_wake_function+0x0/0x40
Jan  3 09:47:45 HOST kernel: [2272007.394307]  [<ffffffff8119ca68>] vfs_write+0xb8/0x1a0
Jan  3 09:47:45 HOST kernel: [2272007.394311]  [<ffffffff8119d542>] sys_pwrite64+0x82/0xa0
Jan  3 09:47:45 HOST kernel: [2272007.394315]  [<ffffffff8100b182>] system_call_fastpath+0x16/0x1b
Jan  3 09:47:45 HOST kernel: [2272007.394319] INFO: task mysqld:1241 blocked for more than 120 seconds.
Jan  3 09:47:45 HOST kernel: [2272007.394389] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  3 09:47:45 HOST kernel: [2272007.394581] mysqld        D ffff88004dda2f40     0  1241   3454    0 0x00000000
Jan  3 09:47:45 HOST kernel: [2272007.394585]  ffff88007df63958 0000000000000082 0000000000000000 00000000ffffffff
Jan  3 09:47:45 HOST kernel: [2272007.394590]  ffff8800ffffffff 0000000000055c14 ffff88007df638e8 ffffffff8112806e
Jan  3 09:47:45 HOST kernel: [2272007.394594]  000000000001b900 ffff88004dda3508 ffff88007df63fd8 000000000001e9c0
Jan  3 09:47:45 HOST kernel: [2272007.394598] Call Trace:
Jan  3 09:47:45 HOST kernel: [2272007.394601]  [<ffffffff8112806e>] ? find_get_page+0x1e/0xa0
Jan  3 09:47:45 HOST kernel: [2272007.394608]  [<ffffffffa006d0bd>] do_get_write_access+0x29d/0x510 [jbd2]
Jan  3 09:47:45 HOST kernel: [2272007.394612]  [<ffffffff81097db0>] ? wake_bit_function+0x0/0x50
Jan  3 09:47:45 HOST kernel: [2272007.394618]  [<ffffffffa006d481>] jbd2_journal_get_write_access+0x31/0x50 [jbd2]
Jan  3 09:47:45 HOST kernel: [2272007.394629]  [<ffffffffa00c9638>] __ext4_journal_get_write_access+0x38/0x80 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394643]  [<ffffffffa00a0563>] ext4_reserve_inode_write+0x73/0xa0 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394653]  [<ffffffffa00a05dc>] ext4_mark_inode_dirty+0x4c/0x1d0 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394664]  [<ffffffffa00d593c>] ? ext4_xattr_get+0x10c/0x2c0 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394677]  [<ffffffffa00a08d0>] ext4_dirty_inode+0x40/0x60 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394683]  [<ffffffff811c7b2b>] __mark_inode_dirty+0x3b/0x160
Jan  3 09:47:45 HOST kernel: [2272007.394690]  [<ffffffff811b792a>] file_update_time+0x10a/0x1a0
Jan  3 09:47:45 HOST kernel: [2272007.394697]  [<ffffffff8112ac6c>] __generic_file_write_iter+0x1fc/0x420
Jan  3 09:47:45 HOST kernel: [2272007.394704]  [<ffffffff81127571>] ? file_read_iter_actor+0x61/0x80
Jan  3 09:47:45 HOST kernel: [2272007.394712]  [<ffffffff8112af15>] __generic_file_aio_write+0x85/0xa0
Jan  3 09:47:45 HOST kernel: [2272007.394719]  [<ffffffff8112af9f>] generic_file_aio_write+0x6f/0xe0
Jan  3 09:47:45 HOST kernel: [2272007.394730]  [<ffffffffa009a331>] ext4_file_write+0x61/0x1e0 [ext4]
Jan  3 09:47:45 HOST kernel: [2272007.394738]  [<ffffffff8119c78a>] do_sync_write+0xfa/0x140
Jan  3 09:47:45 HOST kernel: [2272007.394744]  [<ffffffff81097d70>] ? autoremove_wake_function+0x0/0x40
Jan  3 09:47:45 HOST kernel: [2272007.394751]  [<ffffffff8119ca68>] vfs_write+0xb8/0x1a0
Jan  3 09:47:45 HOST kernel: [2272007.394757]  [<ffffffff8119d542>] sys_pwrite64+0x82/0xa0
Jan  3 09:47:45 HOST kernel: [2272007.394764]  [<ffffffff8100b182>] system_call_fastpath+0x16/0x1b
Jan  3 09:47:52 HOST kernel: [2272013.885915] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jan  3 09:47:52 HOST kernel: [2272013.885998] ata2.00: BMDMA stat 0x25
linux
hard-drive
hardware
ext4
asked on Server Fault Jan 3, 2013 by Mike Janson • edited Mar 15, 2017 by wogsland

3 Answers

3

Congratulations, you have a classic URE. Your error message even explicitly says so.

    Jan  3 09:49:40 HOST kernel: [2272121.968749] sd 1:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed

Have your datacenter replace the defective disk.

answered on Server Fault Jan 3, 2013 by Michael Hampton
0

I see multiple "DRDY ERR" messages, which simply relates to a hard drive failure. Have you ran fsck -cc to find bad sectors and mark them?

Note: make sure you boot into another OS as you really shouldn't run fsck on a mounted partition. And backup backup backup!

answered on Server Fault Jan 3, 2013 by Taylor Jasko
0

First you should back up the data. That is the immediate priority.

The hard disk is bad for sure. You can't get DRDY error, exception emask, SCSI sense key errors all at the same time. It all points to one thing, hdd going bad.

Now, look at the call trace. It shows that ext4 has got the inode, got the data, dirtied the inode but can't write to it. Wait much and you run the risk of getting a read only filesystem. Don't run fsck until you back up.

And when you unmount the hdd and run fsck, try to run in verbose mode.

fsck -fyv <partition-name>

If you can note down the errors, it might come handy next time if you get the issue again.

answered on Server Fault Jan 3, 2013 by Soham Chakraborty

User contributions licensed under CC BY-SA 3.0