How to debug: w3wp.exe process was terminated due to a stack overflow (works on one machine but not another)

41

The problem
I have an ASP.NET 4.0 application that crashes with a stack overflow on one computer, but not another. It runs fine on my development environment. When I move the site to the production server, it throws a stack overflow exception (seen in event log) and the w3wp.exe worker process dies and is replaced with another.

What I've tried so far
For reference, I used the debug diagnostic tool to try to determine what piece of code is causing the overflow, but I'm not sure how to interpret the output of it. The output is included below.

How might an ASP.NET website cause a stack overflow on one machine but not on another?
Experienced leads are appreciated. I'll post the resulting solution below the answer that leads me to it.

Debug Output

Application: w3wp.exe Framework Version: v4.0.30319 Description: The process was terminated due to stack overflow.

In w3wp__PID__5112__Date__02_18_2011__Time_09_07_31PM__671__First Chance Stack Overflow.dmp the assembly instruction at nlssorting!SortGetSortKey+25 in C:\WINDOWS\Microsoft.NET\Framework\v4.0.30319\nlssorting.dll from Microsoft Corporation has caused a stack overflow exception (0xC00000FD) when trying to write to memory location 0x01d12fc0 on thread 16
Please follow up with the vendor Microsoft Corporation for C:\WINDOWS\Microsoft.NET\Framework\v4.0.30319\nlssorting.dll
Information:DebugDiag determined that this dump file (w3wp__PID__5112__Date__02_18_2011__Time_09_07_31PM__671__First Chance Stack Overflow.dmp) is a crash dump and did not perform any hang analysis. If you wish to enable combined crash and hang analysis for crash dumps, edit the IISAnalysis.asp script (located in the DebugDiag\Scripts folder) and set the g_DoCombinedAnalysis constant to True.
Entry point   clr!ThreadpoolMgr::intermediateThreadProc 
Create time   2/18/2011 9:07:10 PM 
Function     Arg 1     Arg 2     Arg 3   Source 
nlssorting!SortGetSortKey+25     01115a98     00000001     0651a88c    
clr!SortVersioning::SortDllGetSortKey+3b     01115a98     08000001     0651a88c    
clr!COMNlsInfo::InternalGetGlobalizedHashCode+f0     01115a98     05e90268     0651a88c    
mscorlib_ni+2becff     08000001     0000000f     0651a884    
mscorlib_ni+255c10     00000001     09ed57bc     01d14348    
mscorlib_ni+255bc4     79b29e90     01d14350     79b39ab0    
mscorlib_ni+2a9eb8     01d14364     79b39a53     000dbb78    
mscorlib_ni+2b9ab0     000dbb78     09ed57bc     01ff39f4    
mscorlib_ni+2b9a53     01d14398     01d1439c     00000011    
mscorlib_ni+2b9948     0651a884     01d143ec     7a97bf5d    
System_ni+15bd65     6785b114     00000000     09ed5748    
System_ni+15bf5d     1c5ab292     1b3c01dc     05ebc494    
System_Web_ni+6fb165 
***These lines below are repeated many times in the log, so I just posted one block of them
1c5a928c     00000000     0627e880     000192ba    
1c5a9dce     00000000     0627e7c4     00000000    
1c5a93ce     1b3c01dc     05ebc494     1b3c01dc    
1c5a92e2
.....(repeated sequence from above)
System_Web_ni+16779c     1b338528     00000003     0629b7a0    
System_Web_ni+1677fb     00000000     00000017     0629ac3c    
System_Web_ni+167843     00000000     00000003     0629ab78    
System_Web_ni+167843     00000000     00000005     0629963c    
System_Web_ni+167843     00000000     00000001     0627e290    
System_Web_ni+167843     00000000     0627e290     1a813508    
System_Web_ni+167843     01d4f21c     79141c49     79141c5c    
System_Web_ni+1651c0     00000001     0627e290     00000000    
System_Web_ni+16478d     00000001     01ea7730     01ea76dc    
System_Web_ni+1646af     0627e290     01d4f4c0     672c43f2    
System_Web_ni+164646     00000000     06273aa8     0627e290    
System_Web_ni+1643f2     672d1b65     06273aa8     00000000    
1c5a41b5     00000000     01d4f520     06273aa8    
System_Web_ni+18610c     01d4f55c     0df2a42c     06273f14    
System_Web_ni+19c0fe     01d4fa08     0df2a42c     06273e5c    
System_Web_ni+152ccd     06273aa8     05e9f214     06273aa8    
System_Web_ni+19a8e2     05e973b4     062736cc     01d4f65c    
System_Web_ni+19a62d     06a21c6c     79145d80     01d4f7fc    
System_Web_ni+199c2d     00000002     672695e8     00000000    
System_Web_ni+7b65cc     01d4fa28     00000002     01c52c0c    
clr!COMToCLRDispatchHelper+28     679165b0     672695e8     09ee2038    
clr!BaseWrapper<Stub *,FunctionBase<Stub *,&DoNothing<Stub *>,&StubRelease<Stub>,2>,0,&CompareDefault<Stub *>,2>::~BaseWrapper<Stub *,FunctionBase<Stub *,&DoNothing<Stub *>,&StubRelease<Stub>,2>,0,&CompareDefault<Stub *>,2>+fa     672695e8     09ee2038     00000001    
clr!COMToCLRWorkerBody+b4     000dbb78     01d4f9f8     1a78ffe0    
clr!COMToCLRWorkerDebuggerWrapper+34     000dbb78     01d4f9f8     1a78ffe0    
clr!COMToCLRWorker+614     000dbb78     01d4f9f8     06a21c6c    
1dda1aa     00000001     01b6c7a8     00000000    
webengine4!HttpCompletion::ProcessRequestInManagedCode+1cd     01b6c7a8     69f1aa72     01d4fd6c    
webengine4!HttpCompletion::ProcessCompletion+4a     01b6c7a8     00000000     00000000    
webengine4!CorThreadPoolWorkitemCallback+1c     01b6c7a8     0636a718     0000ffff    
clr!UnManagedPerAppDomainTPCount::DispatchWorkItem+195     01d4fe1f     01d4fe1e     0636a488    
clr!ThreadpoolMgr::NewWorkerThreadStart+20b     00000000     0636a430     00000000    
clr!ThreadpoolMgr::WorkerThreadStart+3d1     00000000     00000000     00000000    
clr!ThreadpoolMgr::intermediateThreadProc+4b     000c3470     00000000     00000000    
kernel32!BaseThreadStart+34     792b0b2b     000c3470     00000000    
NLSSORTING!SORTGETSORTKEY+25In w3wp__PID__5112__Date__02_18_2011__Time_09_07_31PM__671__First Chance Stack Overflow.dmp the assembly instruction at nlssorting!SortGetSortKey+25 in C:\WINDOWS\Microsoft.NET\Framework\v4.0.30319\nlssorting.dll from Microsoft Corporation has caused a stack overflow exception (0xC00000FD) when trying to write to memory location 0x01d12fc0 on thread 16
asp.net
debugging
stack-overflow
asked on Stack Overflow Feb 19, 2011 by Kevin • edited Jun 28, 2011 by Peter Mortensen

5 Answers

36

This question is a bit old, but I just found a nice way of getting the stack trace of my application just before overflowing and I would like share it with other googlers out there:

  1. When your ASP.NET app crashes, a set of debugging files are dumped in a "crash folder" inside this main folder:

    C:\ProgramData\Microsoft\Windows\WER\ReportQueue

  2. These files can be analysed using WinDbg, which you can download from one of the links below:

  3. After installing it in the same machine where your app crashed, click File > Open Crash Dump and select the largest .tmp file in your "crash folder" (mine had 180 MB). Something like:

    AppCrash_w3wp.exe_3d6ded0d29abf2144c567e08f6b23316ff3a7_cab_849897b9\WER688D.tmp

  4. Then, run the following commands in the command window that just opened:

    .loadby sos clr
    !clrstack
    
  5. Finally, the generated output will contain your app stack trace just before overflowing, and you can easily track down what caused the overflow. In my case it was a buggy logging method:

    000000dea63aed30 000007fd88dea0c3 Library.Logging.ExceptionInfo..ctor(System.Exception)
    000000dea63aedd0 000007fd88dea0c3 Library.Logging.ExceptionInfo..ctor(System.Exception)
    000000dea63aee70 000007fd88dea0c3 Library.Logging.ExceptionInfo..ctor(System.Exception)
    000000dea63aef10 000007fd88dea0c3 Library.Logging.ExceptionInfo..ctor(System.Exception)
    000000dea63aefb0 000007fd88de9d00 Library.Logging.RepositoryLogger.Error(System.Object, System.Exception)
    000000dea63af040 000007fd88de9ba0 Library.WebServices.ErrorLogger.ProvideFault(System.Exception, System.ServiceModel.Channels.MessageVersion, System.ServiceModel.Channels.Message ByRef)
    

Thanks to Paul White and his blog post: Debugging Faulting Application w3wp.exe Crashes

answered on Stack Overflow Nov 9, 2012 by Thomas C. G. de Vilhena • edited Apr 3, 2018 by Michael
5

A default stack limit for w3wp.exe is a joke. I always raise it with editbin /stack:9000000 w3wp.exe, it should be sufficient. Get rid of your stack overflow first, and then debug whatever you want.

answered on Stack Overflow Feb 19, 2011 by SK-logic
3

Get a crash dump, run it against Microsoft's Debug Diagnostic Tool and show us the result.

Also take a look at http://support.microsoft.com/kb/919789/en-us, which explains all the necessary steps in detail.

answered on Stack Overflow Aug 5, 2011 by VVS
1

Two things I would try before analysing any memory dumps.

  1. Install the remote debugging tool on the web server and try debugging that way. You can find this tool on the Visual Studio install DVD.
  2. Install Elmah. Elmah can be added to a running ASP.NET application for logging and debugging. I would probably go with this option first and it's the least painful approach. http://code.google.com/p/elmah/
answered on Stack Overflow Aug 10, 2011 by Vince Panuccio
1

One possibility for your application behaving differently in production vs development could be preprocessor directives like #if DEBUG in the code. When you deploy to production the release build would have different code segments than your debug build.

Another option would be that your application is throwing an unrelated exception in production. And the error handling code somehow ends up in an infinite function calling loop. You may want to look for an infinite loop that has a function call to itself or another function that calls this function back. This ends up in an infinite function callig loop because of the infinite for or while loop. I apologize for going overboard with the word 'infinite'.

It's also happened to me before when I accidentally created a property and returned the property inside my property. Like:

public string SomeProperty { get { return SomeProperty; } }

Also, if possible you could do special stuff with the exception in the Application_error function of your global.asax. Use server.getlasterror() to get the exception and log/display the stack trace. You may want to do the same for any innerexceptions or innerexceptions of innerexceptions and so on.

You may already be doing the above mentioned things but I wanted to mention them just in case.

Also, from your trace it looks like the error is happening in GetSortKey. Is that a function in your code? If so, then your infinite self calling may start there.

Hope this helps.

answered on Stack Overflow Aug 20, 2011 by Nabheet • edited Apr 3, 2018 by Michael

User contributions licensed under CC BY-SA 3.0