HP DL165G7: NMI error

4

One of "my" DL165 G7 Proliants has rebooted out of the blue for the second time this month. The reboot was accompanied by these system event log entries in LightsOut:

Event Type  Date    Time    Source  Description Direction
OEM --  --  --  00 00 00 00 01 02 00 00 00 00 00 00 00  --
Generic 07/19/2013  16:40:38    NMI Detect  State Asserted  Assertion
Generic 07/19/2013  16:40:42    Gen ID 0x41 Run-time Stop   Assertion
OEM 07/19/2013  16:40:42    000137  01 80 00 00 00 01   --
OEM 07/19/2013  16:40:42    000137  02 54 44 4f 00 01   --
OEM 07/19/2013  16:40:42    000137  02 00 00 00 00 01   --
OEM 07/19/2013  16:40:42    000137  03 00 00 00 00 01   --
OEM 07/19/2013  16:40:42    000137  03 00 00 00 00 01   --
OEM 07/19/2013  16:40:42    000137  04 00 00 00 00 01   --
OEM 07/19/2013  16:40:42    000137  04 00 00 00 00 01   --
OEM 07/19/2013  16:40:42    000137  05 00 00 00 00 01   --
OEM 07/19/2013  16:40:42    000137  05 00 00 00 00 01   --
Generic 07/19/2013  16:43:54    Gen ID 0x41 C: boot completed   Assertion
OEM 07/19/2013  16:43:54    000137  00 b4 6c e9 51 00   --

I have contacted HP support to get help decoding the events, but unfortunately without any notable success - I have been told that there is no accessible documentation available. What is it trying to tell me and how do I find out what is broken here?

Edit: the system is running Hyper-V 2012. The only useful event concerning the reset is Kernel-Power/41 with a BugcheckCode of 128 / 0x00000080 and BugcheckParameter1 of 0x4f4454 which match the first two OEM lines of the iLO event log (after you swap the bytes in little-endian manner, at least). The bugcheck code led me to this MSDN article which is bluntly stating that "the exact cause is difficult to determine".

In the HP support center, I could find a seemingly similar problem description with the solution being to synchronize the clocks between cluster nodes. While my breaking host indeed does run in a cluster, I have the clocks synchronized and I cannot reproduce the issue when the clocks are drifting apart (the obvious Kerberos authentication problems put aside, nothing much is happening if I desync the clocks).

The odd information I have been able to collect on the issue so far:

hardware
hp
ipmi
asked on Server Fault Jul 19, 2013 by the-wabbit • edited Jul 19, 2013 by the-wabbit

1 Answer

1

I had a similar problem with HP ProLiant G380 G6 and Windows 2008 R2, digging into the support and help forums got me nowhere, I eventually used the HP Smart Update Manager DVD to install all the latest updates on the server, a year and a half passed with no errors so far.

It might be a long shot, but try to use the latest updates, here's the latest HP SUM DVD

If you try to run that on a 2012 server, you might get an error that is it not compatible, according to HP that is is normal and you only need to ignore the error.

Hope this helps.

answered on Server Fault Jul 22, 2013 by Noor Khaldi

User contributions licensed under CC BY-SA 3.0