I have an Exchange 2016 Server which bsods with about 14 days inbetween. The server is virtual and exists in a clustered vmware environment with storage via iSCSI. None of the other Windows servers we have running (including the passive copy of Exchange) bsods. The passive Exchange is beeing backed up and clears the transaction-logs as it should on both the passive and active node.
Here is what BSoD viewer gives me of information:
052716-21921-01.dmp 27.05.2016 10:22:16 CRITICAL_PROCESS_DIED 0x000000ef ffffe000`de10d080 00000000`00000000 00000000`00000000 00000000`00000000 ntoskrnl.exe ntoskrnl.exe+14e3a0 NT Kernel & System Microsoft® Windows® Operating System Microsoft Corporation 6.3.9600.18289 (winblue_ltsb.160328-1315) x64 ntoskrnl.exe+14e3a0 C:\Windows\Minidump\052716-21921-01.dmp 8 15 9600 138 150 27.05.2016 10:22:47
051516-25765-01.dmp 15.05.2016 10:11:06 CRITICAL_PROCESS_DIED 0x000000ef ffffe001`0ad80900 00000000`00000000 00000000`00000000 00000000`00000000 ntoskrnl.exe ntoskrnl.exe+14e3a0 NT Kernel & System Microsoft® Windows® Operating System Microsoft Corporation 6.3.9600.18289 (winblue_ltsb.160328-1315) x64 ntoskrnl.exe+14e3a0 C:\Windows\Minidump\051516-25765-01.dmp 8 15 9600 138 150 15.05.2016 10:11:41
042816-19328-01.dmp 28.04.2016 22:36:50 CRITICAL_PROCESS_DIED 0x000000ef ffffe001`3da4f900 00000000`00000000 00000000`00000000 00000000`00000000 ntoskrnl.exe ntoskrnl.exe+14e8a0 NT Kernel & System Microsoft® Windows® Operating System Microsoft Corporation 6.3.9600.18289 (winblue_ltsb.160328-1315) x64 ntoskrnl.exe+14e8a0 C:\Windows\Minidump\042816-19328-01.dmp 8 15 9600 294 472 28.04.2016 22:39:45
041916-23859-01.dmp 19.04.2016 08:43:53 CRITICAL_PROCESS_DIED 0x000000ef ffffe001`23101900 00000000`00000000 00000000`00000000 00000000`00000000 ntoskrnl.exe ntoskrnl.exe+14e8a0 NT Kernel & System Microsoft® Windows® Operating System Microsoft Corporation 6.3.9600.18289 (winblue_ltsb.160328-1315) x64 ntoskrnl.exe+14e8a0 C:\Windows\Minidump\041916-23859-01.dmp 8 15 9600 294 472 19.04.2016 08:47:04
I saw a post with the same problem on a diffrent site, but none actually answered the problem and the post aged out.
Do anyone have any pointers on how to fix this? Would I have to install ANOTHTER Exchange server and migrate into? This would be very unfortunate..
Your storage system is failing or too slow to keep up. If IO has been stalled for too long, Exchange thinks that storage is dead and kills Wininit to force hard reset.
See https://technet.microsoft.com/en-us/library/ff625233.aspx and scroll to the end. It's the same for 2013 and 2016.
In some cases, the entire storage stack may be affected by the hang, making it impossible to write failure events to the crimson channel or any other area of the Windows Event Log. ESE also monitors the crimson channel by verifying that the event log can be written to. If writing to the event log fails for a long period of time, MSExchangeRepl intentionally causes a bugcheck of Windows by terminating wininit.exe. When the operating system I/O is hung, the system is obviously unable to write any ESE events to the event log.
I have experienced it firsthand when using Windows Server Backup to backup Exchange. When backup begins, it will do consistency check on all databases in parallel. This caused Exchange to BSoD after a few minutes when storage dropped out.
First solution is to disable ATS heartbeat to storage array https://kb.vmware.com/kb/2113956
Text is too long to copy but TL;DR: Your storage array connection may be dropped under heavy IO when ATS heartbeat of 8 seconds times out, that will cause IO timeout in VM, causing Exchange to BSoD.
Secondary solution is to add storage controllers to VM and distribute database disks between controllers. In my case, single pvscsi controller would choke badly under 6 databases, but when disks (including OS disk etc) were distributed over 4 pvscsi controllers, issues disappeared. I don't have a reference for that, just personal experience on vSphere 5.5 U3.
You can issue a command to disable the ESE forced reboot, the cause is well explained by Don's answer.
I did it latelly for a custumer with a single server with esx, as the IO was overkilling the Exchange. (its still killing it, as it take age to simply open a management console in example, but atleast it dont reboot..)
Add-GlobalMonitoringOverride -Identity ExchangeActiveDirectoryConnectivityConfigDCServerReboot -ItemType Responder -PropertyName Enabled -PropertyValue 0 -ApplyVersion “15.0.712.24
In there you need to use the correct Exchange version.
See there for Exchange version; https://technet.microsoft.com/en-us/library/hh135098(v=exchg.150).aspx
See there for furter detail; http://www.tecfused.com/2014/11/exchange-2013-dag-bsod/
User contributions licensed under CC BY-SA 3.0