Full Memory Dump: https://pastebin.com/spkLeVYL
Crash message is:
USER_MODE_HEALTH_MONITOR (9e)
One or more critical user mode components failed to satisfy a health check.
Hardware mechanisms such as watchdog timers can detect that basic kernel
services are not executing. However, resource starvation issues, including
memory leaks, lock contention, and scheduling priority misconfiguration,
may block critical user mode components without blocking DPCs or
draining the nonpaged pool.
Kernel components can extend watchdog timer functionality to user mode
by periodically monitoring critical applications. This bugcheck indicates
that a user mode health check failed in a manner such that graceful
shutdown is unlikely to succeed. It restores critical services by
rebooting and/or allowing application failover to other servers.
Arguments:
Arg1: ffffe00026e00780, Process that failed to satisfy a health check within the configured timeout
Arg2: 000000000000003c, Health monitoring timeout (seconds)
Arg3: 000000000000000a, WatchdogSourceClussvcIsAlive
Cluster service sends heartbeat to netft every 500 millseconds.
By default netft expects at least 1 heartbeat per second.
If this watchdog was triggered that means clussvc is o not getting
CPU to send heartbers.
Arg4: 0000000000000000
Something in User Mode caused the Failover Clustering Service to become unresponsive, so User Mode processes and general hang debugging is the problem.Clustering has health detection between the user mode service and the kernel mode NetFT
driver. If user mode goes unresponsive, then clustering bugchecks the box in an effort to force a failover. A STOP 0x9e
is expected cluster behavior. A stop 0x9e
for netft.sys
, which is an intentional bugcheck caused by the cluster service due to a deadlock condition identified.
I found this in an article, I was wondering if I should change the recovery action HangRecoveryAction
?
This property controls the action to take if the user-mode processes have stopped responding. For the HangRecoveryAction
, we actually have 4 different settings with 3 being the default.
0 = Disables the heartbeat and monitoring mechanism.
1 = Logs an event in the system log of the Event Viewer.
2 = Terminates the Cluster Service.
3 = Causes a Stop error (Bugcheck) on the cluster node. <<– default for 2008
Server is 2012 R2.
User contributions licensed under CC BY-SA 3.0