Can't start VM on Hyper-V 2008R2 cluster

1

I have a Server 2008R2 Hyper-V cluster with 2 nodes. They use a CSV on a SAN. I use SCVMM to manage them. We recently had several crashes that caused a failover, making virtual machines die and start up on the other node. For the most part, this worked fine. At one point during a power failure, both nodes were unable to access the SAN for a moment, to the CSV went offline. Bringing it online in Failover Cluster Manager worked, and most of the virtual machines started just fine.

One virtual machine however will not start.

  • In SCVMM, it shows as missing.
  • In Failover Cluster Manager, it shows as Offline, with the "SCVMM hostname Configuration" resource failed.
  • Trying to start the failed Configuration resource, or move the virtual machine to the other node results in a 5 minute wait, followed by the error "Error Code: 0x80071714 The group is unable to accept the request since it is moving to another node".

Besides the error above, there don't seem to be any recent relevant logs in the failover cluster or windows event logs on either node. There are some in Critical events I can see in failover cluster manager from when the failures happened last week:

  • Event ID 21502: 'SCVMM hostname Configuration' failed to register the virtual machine with the virtual machine management service.
  • 25 minutes later, Event ID 1230: Cluster resource 'SCVMM hostname Configuration' (resource type '', DLL 'vmclusres.dll') either crashed or deadlocked. The Resource Hosting Subsystem (RHS) process will now attempt to terminate, and the resource will be marked to run in a separate monitor.
  • That one was repeated 3 more times, 5 minutes apart.
  • No logs since then.

I've looked at files on the SAN. All of them appear to be intact. The XML configuration file seems to be valid (some research showed this could happen if the XML file got corrupted).

Edit: I have also run the cluster validation report. Besides the failed resource and some expected errors that it couldn't test the disks while they are online, everything looks fine.

How do I go about getting this virtual machine running again?

hyper-v-server-2008-r2
failovercluster
scvmm
asked on Server Fault Jan 2, 2014 by Grant • edited Jan 3, 2014 by Grant

1 Answer

0

Despite not knowing exactly what caused the problem, it was pretty easy to get the VM running again:

  • Figure out which node the problem VM is on
  • Put it in maintenance mode in VMM (or just live migrate everything off that node). The problem VM will still be stuck on that node.
  • Stop the cluster service on that node, then start it again.

When I stopped the cluster service, the VM was immediately taken over by one of the remaining nodes and started up automatically.

answered on Server Fault Jan 13, 2014 by Grant

User contributions licensed under CC BY-SA 3.0