Windows Server 2012 R2 (Hyper-V VMs) - random BSOD

Question

Windows Server 2012 R2 (Hyper-V VMs) - random BSOD

I have a problem. My VMs (Hyper-V) - Windows Server 2012 R2 restart themselves quite often (BSOD: CRITICAL_STRUCTURE_CORRUPTION (109)). Last time it was 11x over weekend. I have new HW, 2x Supermicro server. I installed Windows Server 2012 R2 and Hyper‑V role on both servers (+ drivers from Supermicro website are installed). As a guest systems (VMs) I have 2x Windows Server 2012 and 1x Windows Server 2012 R2 on each Hyper-V host. Like I wrote, problem is, that W2012R2 VMs randomly restart themselves. But only W2012R2 VMs. VMs with W2012 are OK. All systems are clean, no applications are installed and there is no workload.

After reboot, there are these events logged on VMs:

Kernel-Power 41

EventData:
BugcheckCode 265 
BugcheckParameter1 0xa3a01f59e148b50a 
BugcheckParameter2 0xb3b72be033c8b301 
BugcheckParameter3 0x1a0 
BugcheckParameter4 0x7 
SleepInProgress 0 
PowerButtonTimestamp 0 
BootAppStatus 0

BugCheck 1001

EventData 
param1 0x00000109 (0xa3a01f59e148b50a, 0xb3b72be033c8b301, 0x00000000000001a0, 0x0000000000000007) 
param2 C:\Windows\MEMORY.DMP 
param3 021516-3093-01

WinDbg output:

CRITICAL_STRUCTURE_CORRUPTION (109)
This bugcheck is generated when the kernel detects that critical kernel code or
data have been corrupted. There are generally three causes for a corruption:
1) A driver has inadvertently or deliberately modified critical kernel code
 or data. See http://www.microsoft.com/whdc/driver/kernel/64bitPatching.mspx
2) A developer attempted to set a normal kernel breakpoint using a kernel
 debugger that was not attached when the system was booted. Normal breakpoints,
 "bp", can only be set if the debugger is attached at boot time. Hardware
 breakpoints, "ba", can be set at any time.
3) A hardware corruption occurred, e.g. failing RAM holding kernel code or data.
Arguments:
Arg1: a3a01f5a69a8b6bb, Reserved
Arg2: b3b72be0bc28b4a2, Reserved
Arg3: 00000000000001a0, Failure type dependent information
Arg4: 0000000000000007, Type of corrupted region, can be
0 : A generic data region
1 : Modification of a function or .pdata
2 : A processor IDT
3 : A processor GDT
4 : Type 1 process list corruption
5 : Type 2 process list corruption
6 : Debug routine modification
7 : Critical MSR modification

Debugging Details:

PG_MISMATCH:  40000
CUSTOMER_CRASH_COUNT:  1
DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT_SERVER
BUGCHECK_STR:  0x109
PROCESS_NAME:  System
CURRENT_IRQL:  2
ANALYSIS_VERSION: 6.3.9600.17336 (debuggers(dbg).150226-1500) amd64fre
STACK_TEXT:  
ffffd001\`1bb7e088 00000000\`00000000 : 00000000\`00000109 a3a01f5a\`69a8b6bb b3b72be0\`bc28b4a2 00000000\`000001a0 : nt!KeBugCheckEx
STACK_COMMAND:  kb
SYMBOL_NAME:  ANALYSIS_INCONCLUSIVE
FOLLOWUP_NAME:  MachineOwner
MODULE_NAME: Unknown_Module
IMAGE_NAME:  Unknown_Image
DEBUG_FLR_IMAGE_TIMESTAMP:  0
IMAGE_VERSION:  
BUCKET_ID:  BAD_STACK
FAILURE_BUCKET_ID:  BAD_STACK
ANALYSIS_SOURCE:  KM
FAILURE_ID_HASH_STRING:  km:bad_stack
FAILURE_ID_HASH:  {75814664-faf6-4b70-bbc7-dc592132ecdd}
Followup: MachineOwner

Sometimes, there is this event logged on the host server. But not every time when VM fails:

Hyper-V-Worker 18590

VmErrorCode0 0x109
VmErrorCode1 0xbb8d251d
VmErrorCode2 0xe0d2304
VmErrorCode3 0x1a0
VmErrorCode4 0x7

Could you help me solve this problem please?

windows

virtual-machines

hyper-v

windows-server-2012-r2

bsod

asked on Server Fault Feb 17, 2016 by

devlin • edited Feb 17, 2016 by

Drifter104

3 Answers

Solution that worked for me:

Set the following Custom Power Settings under Advanced Power Management Configuration:

Note: The highlighted lines are the important changes, but make sure the other settings are also the same as in the pictures

Other things that I did, which may have helped (I did these before doing the above, so I'm not sure if it is relevant or not):

Installed KB2970215 from Microsoft - this fixes "random blue screens" on specific CPU chipsets
Installed the latest drivers for the Intel Chipset from Supermicro's web site (for me, it is ftp://ftp.supermicro.nl/driver/Intel_INF/C612_Series_Chipset/Chipset_v10.1.2.8.zip - locate one best suited for you)
Installed the latest Network Driver (example: ftp://ftp.supermicro.nl/driver/LAN/Intel/PRO_v20.3.zip)
Installed the RSTE Utility & Driver (example: ftp://ftp.supermicro.nl/driver/SATA/Intel_PCH_RAID_Romley_RSTE/Management/4.3.0.1219/IATA_CD.exe)

Sources:

https://social.technet.microsoft.com/Forums/en-US/f8ba6d82-b79d-4b17-b13b-269841a9f236/vm-going-down-bugcheck-0x109?forum=winserverhyperv
Supermicro Partner Support

answered on Server Fault Feb 22, 2016 by

KeyszerS • edited Feb 25, 2016 by

KeyszerS

Setting "Package C State Limit - C0/C1 State" causes BSODs (as well as setting Power Technology - [Disable]). Because I can't set "C0/C1 State", I choosed "C2 state" which is working without problems. In a nutshell: The higher Package C State Limit you chose, the more energy efficient CPU would be (by stopping clocks, reducing voltage...).

The best performance settings in this case should be:

Advanced Power Management Configuration:

Power Technology - [Custom]
Energy Performance Tuning - [Disable]
Energy Performance BIAS setting. - [Performance]
Energy Efficient Turbo - [Disable]

CPU P State Control:

EIST(P-States) - [Enable]
Turbo Mode - [Enable]
P-state Coordination - [HW_ALL]

CPU C State Control:

Package C State Limit - [C2 state]
CPU C3 Report - [Disable]
CPU C6 Report - [Disable]
Enhanced Halt State (C1E) - [Disable]

I found, that this type of problem appeared few times in the past and was fixed by updating ROM or by Host Microcode update like this: KB2970215. But I haven't found any working update yet.

sources:
http://www.supermicro.com/support/faqs/faq.cfm?faq=21555 http://www.supermicro.com/support/faqs/faq.cfm?faq=21499

answered on Server Fault Mar 3, 2016 by

devlin

We've had the same errors also. The answer from @KeyszerS is a really good hint.

It seems that the errors are related to the power management of X10 boards (at least for supermicro). I did several tests with and without any power management - sometimes the BSODs occured more often and sometimes they've been nearly passed.

Since a few days i've a solution that works (at least for us) reliable. We've evaluating the stability with 20 VMs on one affected server - no crashes anymore.

So, how to get it: Easiest way is to revert BIOS settings to defaults and just disable "energy efficient turbo".

Update:

No issues at all for around 7 days - workload seems to be quite stable. Here's a screenshot of power management settings in BIOS - it seems to be related to "Energy Efficient Turbo".

answered on Server Fault Feb 27, 2016 by

Daniel Nachtrub • edited Mar 3, 2016 by

Daniel Nachtrub

User contributions licensed under CC BY-SA 3.0