May system runs on a RAID 5 on a RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)
Since about a week the system becomes very slow. I get messages on the commant line like this (taken from syslog)
Jan 17 18:16:12 HAUPTRECHNER kernel: [ 840.329151] megacli.real D ffff880402c7fc88 0 4058 4057 0x00000000
Jan 17 18:16:12 HAUPTRECHNER kernel: [ 840.329186] [<ffffffffc001ce51>] megasas_issue_blocked_cmd+0x121/0x210 [megaraid_sas]
Jan 17 18:16:12 HAUPTRECHNER kernel: [ 840.329200] [<ffffffffc00242a4>] megasas_mgmt_fw_ioctl+0x3e4/0xae0 [megaraid_sas]
Jan 17 18:16:12 HAUPTRECHNER kernel: [ 840.329210] [<ffffffffc0024b6b>] megasas_mgmt_ioctl_fw.isra.25+0x1cb/0x230 [megaraid_sas]
Jan 17 18:16:12 HAUPTRECHNER kernel: [ 840.329218] [<ffffffffc0024e48>] megasas_mgmt_ioctl+0x28/0x40 [megaraid_sas]
beside this the syslog is also flooded with this message I have no idea what it means:
Jan 17 18:13:44 HAUPTRECHNER kernel: [ 692.360649] megaraid_sas 0000:02:00.0: RESET_GEN2: retry=5a, hostdiag=ffffffff
Jan 17 18:13:45 HAUPTRECHNER kernel: [ 692.464643] megaraid_sas 0000:02:00.0: RESET_GEN2: retry=5b, hostdiag=ffffffff
Jan 17 18:13:45 HAUPTRECHNER kernel: [ 692.568659] megaraid_sas 0000:02:00.0: RESET_GEN2: retry=5c, hostdiag=ffffffff
Jan 17 18:13:45 HAUPTRECHNER kernel: [ 692.672630] megaraid_sas 0000:02:00.0: RESET_GEN2: retry=5d, hostdiag=ffffffff
Jan 17 18:13:45 HAUPTRECHNER kernel: [ 692.776626] megaraid_sas 0000:02:00.0: RESET_GEN2: retry=5e, hostdiag=ffffffff
Jan 17 18:13:45 HAUPTRECHNER kernel: [ 692.880619] megaraid_sas 0000:02:00.0: RESET_GEN2: retry=5f, hostdiag=ffffffff
In that case some commands like fdisk
do not return (getting the 120 seconds message issued after a while) and I get wired output from ls
like that (ignore the dates):
# ls -l
-rw-r--r-- 1 root root 1148 Jan 9 11:53 file
-rw-r--r-- 1 root root 1320 Dez 13 10:28 file.1
-rw-r--r-- 1 root root 300 Apr 1 2018 file.10.gz
-????-???- 1 ???? ???? 252 Feb 12 2018 file.11.gz
-rw-r--r-- 1 root root 2121 Jan 31 2018 file.12.gz
-rw-r--r-- 1 root root 980 Nov 29 18:05 file.2.gz
-????-???- 1 ???? ???? 252 Feb 12 2018 file.3.gz
-????-???- 1 ???? ???? 252 Feb 12 2018 file.4.gz
-rw-r--r-- 1 root root 1889 Okt 31 17:17 file.5.gz
-????-???- 1 ???? ???? 252 Feb 12 2018 file.6.gz
-????-???- 1 ???? ???? 252 Feb 12 2018 file.7.gz
But some other time (after two or three reboots) the system behaves normal and ls
shows normal output.
The hds themselfes look OK:
#megacli -PDList -a0 | egrep "flagged|Temperature|Firmware s|Port Number:"
Firmware state: Online, Spun Up
Connected Port Number: 1(path0)
Drive Temperature :65C (149.00 F)
Drive has flagged a S.M.A.R.T alert : No
Firmware state: Online, Spun Up
Connected Port Number: 2(path0)
Drive Temperature :69C (156.20 F)
Drive has flagged a S.M.A.R.T alert : No
Firmware state: Online, Spun Up
Connected Port Number: 3(path0)
Drive Temperature :68C (154.40 F)
Drive has flagged a S.M.A.R.T alert : No
Firmware state: Online, Spun Up
Connected Port Number: 0(path0)
Drive Temperature :60C (140.00 F)
Drive has flagged a S.M.A.R.T alert : No
Is this an evidence that the RAID controller dies and should be replaced?
User contributions licensed under CC BY-SA 3.0