I'm responsible for a Citrix Presentation Server 4.5 farm. Starting Friday 30. November, my servers started to crash randomly. So far we've experienced 80 crashes, so it's obviously becoming an increasingly big problem for us. I have 12+ years experience with IT, so I know the difference between 0 and 1, but I have a hard time cracking this.
We've rolled back any recent changes I can think of for different groups of servers, but all groups still seem to crash. I don't have the skills to interpret the memory dumps to find the culprit.
Any help is greatly appreciated. I can also provide links to kernel memory dumps or WinDbg output if necessary.
Thanks!
The majority of the STOP errors we encounter are:
We also see a few 0x0000000a IRQL_NOT_LESS_OR_EQUAL (3%).
For both 0x0000008e and 0x0000007e bug checks, the exception code is 0xc0000005 (Access Violation). When opening dump files in WinDbg, most details are exactly the same, for all the 0x0000008e and 0x0000007e bug checks respectively:
0x0000008e
0x0000007e
About 30% of the crashes happens between 17:00 and 19:00, which leads me to believe this tend to happen more often during logoffs. But then again, only ~15% occurs between 15:00 and 17:00.
We had a similar issue on an older version of citrix (PS4) that was down to HP Print drivers. I had to clear the whole lot off before re-installing the appropriate ones and it seemed to clear the blue scdreen issue. Also Curious about "automated deletion of non-approved drivers every night". If you clear non-approved ones down each night, why do you allow them to be installed in the first place? You can stop them being installed in the citrix policies. Think it is under Printing -> Drivers -> Native printer driver auto-install (set to do not automatically install)
We ended up applying PS 4.5 roll-up pack 7 (which wasn't installed, because it previously broke session reliability for us) and a number of post-R07 hotfixes.
Furthermore we replaced the newest beta of UPHClean 2.0, which Microsoft have since abandoned as a separate component (still built-in to later versions of Windows), with the newer UPHClean 1.6g.
The farm has been stable since, but it's still a mystery why all hell suddenly broke lose, without making any major changes.
User contributions licensed under CC BY-SA 3.0