symptoms of a raid controller dying?

0

we have a windows server running 24/7.
i have been worried for quite a while, when i started taking a look at the windows event log.
there i found a lot of instances of "Kernel Power Event ID 41":
it indicated that multiple times a day (mainly during the night) the server unexpectedly re-booted after a crash.

the server has been running rock solid for years!
so my first assumption was some faulty software with recent patches.
but i just couldn't make out any pattern - as to why and when exactly there would be a crash.

doing some web searching for Kernel Power Event ID 41 it mainly points to hardware issues:
PSU glitches, cpu or memory overheating, etc.

the server has a LSI MegaRAID 9260-4i with 4 physical HDDs, two of them each configured as RAID 1.
the raid controller logs don't show anything suspicious (in regards to any of the physical disks having any problems).

so i'm currently thinking the raid controller itself may be having problems.
this idea is backed up by the following two observations:

1)
i boot from windows server OS "installation CD".
then go into recovery options.
then select "restore from backup" (with the USB HDD backup drive connected).
at a certain stage during the restore procedure it will throw error 0x80070002.
and if i then switch over to the command prompt: no drives will be visible.

2)
quite similar with "acronis true image".
i boot from ATI recovery CD.
then select to backup my partitions.
it all starts processing.
but at some point it's throwing some error.
and after cancelling that backup procedure, then going to "backup my disks and partitions" everything is empty!

--

all of the above makes me assume the following:
the raid controller itself (not the physical HDDs) must be defective:
right in the middle of operations the logical drives just "disappear".

during windows server uptime this causes an OS crash - followed by a re-boot.
during windows backup restore from CD the drives suddenly disappear.
during ATI backup from CD the drives suddenly disappear.

--

considering all of the above:
is it safe to assume that these are symptoms of the raid controller itself dying?. it's neither the physical HDDs nor any other system components causing the problems?

to get the current problelms sovled:
would the best option be to get the current raid controller replaced with an identical one?

raid
asked on Server Fault Mar 25, 2021 by paulgutten

1 Answer

1

Biggest part of IT is don't panic and don't assume.

Many times you have to load additional drivers for Windows Recovery Discs/ Install disk or Acronis to see the Raid configuration. The version of Windows server would help determine if the raid controller drivers should already be on the recovery media. Also, if you did not build the Acronis media from that server, it likely doesn't have the drivers to see the raid controller.

Side note, check the power profile in the control panel and ensure the drive(s) and system never goes to sleep or power down. This is likely not the issue, but check it anyways. Let us know what version of windows OS you are running.

Cheers!

answered on Server Fault Mar 25, 2021 by bitcollision

User contributions licensed under CC BY-SA 3.0