Debugging desktop heap exhaustion

7

I'm supporting a product at the moment which seems to be consuming a lot of desktop heap. The binaries are mostly .net, and are all running session 0 as non-interactive processes (they are all sub-processes of an installed windows service). So, to my knowledge, they shouldn't be consuming any Desktop Heap.

We've had a few environments report event ID 243 in the system log, followed by event ID 1000 in the application log intermittently; the exception in the application log is always 0xc0000142. Eventually, one of our services will also fall over with some cryptic (useless) message. Unfortunately we've never been able to capture the exception, but these all seem to be pretty clear indicators of desktop heap exhaustion.

I'm trying to figure out what is consuming so much desktop heap, so that I can track down the cause. But this is where I'm getting very stuck. Initially I planned to install Desktop Heap Monitor, but after a few failed attempts at getting it to work, I realized that it's not supported on anything past XP. I read somewhere that Process Explorer should be able to give me the same information, so we've been monitoring the following objects in PE:

  1. Handle Count
  2. GDI Objects
  3. USER Objects

The Handle Count value when event 243 gets reported isn't dramatically different from several days earlier when the problem was not occurring, or even within a few minutes of the process having started up. And the GDI and USER Objects are both zero. So, I'm at a loss a to what exactly could be exhausting the desktop heap, or for that matter, how to debug it any further. I read somewhere that WeakEventManager may cause this time of issue, but we don't seem to be using this anyway.

I've searched this thing to death on both google and SO, and I've not found anything so far. All I'm really after is to determine which process is exhausting the heap, or at least which one is consuming the most. If anybody has any pointers on how to do this, I'd really appreciate it.

heap
asked on Stack Overflow Sep 5, 2017 by another_one • edited Sep 5, 2017 by halfer

1 Answer

8

An old thread but I thought I'd loop back incase somebody comes across this in the future. After some debugging, we scoped down which of our processes was causing the problem. I decided to attach WinDbg to the process and set a bp on CreateWindowEx and NtDestroyWindow. Sure enough, CreateWindowEx was indeed being called to create a hidden window; from the params on the stack I was able to get the class of that Window (it was always the same), which helped scope things down further.

Over time, I started to notice that the number of calls to NtDestroyWindow was less than the number of calls to CreateWindowEx. So I stepped down the callstack to look at what was creating the windows ... there was a class constructor and destructor (native, not managed). It seemed that we were not calling the destructor as often as we called the constructor, so over time, we were leaking some instances of these classes, and with each, we were also "leaking" a hidden window, which accumulated over time and caused the desktop heap exhaustion problem. From here, we managed to track down where the instances of that class were not being destroyed, and were able to fix the problem.

Not happy with my lot however, I was curious why Process Explorer hadn't been as useful to me as I had expected it to be. All of this time, it was showing zero User Objects, even when I knew the process was creating window objects. I then realized that PE can only show this data for processes that are running in the same session. So, I had to run it in session zero in order to track the windowo objects of the service. With a little help from PsExec, and the posts below, I was able to run PE in session zero and switch to it.

https://superuser.com/questions/426868/interactive-session-0-in-windows-7

https://blogs.technet.microsoft.com/home_is_where_i_lay_my_head/2012/10/09/windows-8-interactive-services-detection-error-1-incorrect-function/

From there, I could indeed see the process had over 1,000 User Objects. I was also able to run WinSpy and confirm my findings. Of course, at this stage, it was all academic, but maybe this will be useful for somebody in the future.

answered on Stack Overflow Apr 6, 2018 by another_one

User contributions licensed under CC BY-SA 3.0