Azure service dies after a few days

1

I have Small size (x2) cloud service running and operating just fine for 4-6 days but then it becomes unresponsive and requires manual restart through Azure portal to get it back online.

Windows event logs show that virtual memory is running low. After 2-3 days I start getting:

Windows successfully diagnosed a low virtual memory condition. The following programs consumed the most virtual memory: WaIISHost.exe (3836) consumed 3810709504 bytes, CacheService.exe (1528) consumed 823902208 bytes, and w3wp.exe (1728) consumed 145485824 bytes.

After a while services start failing (due to memory problem??):

Application: CacheService.exe Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception. Exception Info: Microsoft.ApplicationServer.Caching.ConfigStoreException
Stack: at Microsoft.ApplicationServer.Caching.CustomProviderProxy+<>c_DisplayClass5.b_3(System.Object)
at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
at System.Threading.ThreadPoolWorkQueue.Dispatch() at System.Threading.ThreadPoolWaitCallback.PerformWaitCallback()
|http://schemas.microsoft.com/win/2004/08/events/event'>1026200x800000000000001138ApplicationRD00155D45A2ADApplication: CacheService.exe Framework Version: v4.0.30319 Description: The process was terminated due to an unhandled exception.
Exception Info: Microsoft.ApplicationServer.Caching.ConfigStoreException
Stack: at Microsoft.ApplicationServer.Caching.CustomProviderProxy+<>c
_DisplayClass5.<PerformOperation>b__3(System.Object)
at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
at System.Threading.ThreadPoolWorkQueue.Dispatch() at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()

and eventually Http requests start failing:

Process information:
Process ID: 3344
Process name: w3wp.exe
Account name: NT AUTHORITY\NETWORK SERVICE
Exception information:
Exception type: HttpException
Exception message: The paging file is too small for this operation to complete. (Exception from HRESULT: 0x800705AF) at System.Web.Compilation.BuildManager.ReportTopLevelCompilationException()
at System.Web.Compilation.BuildManager.EnsureTopLevelFilesCompiled()
at System.Web.Hosting.HostingEnvironment.Initialize(ApplicationManager appManager, IApplicationHost appHost, IConfigMapPathFactory configMapPathFactory, HostingEnvironmentParameters hostingParameters, PolicyLevel policyLevel, Exception appDomainCreationException)
The paging file is too small for this operation to complete. (Exception from HRESULT: 0x800705AF)

I'm not sure if the problem is that virtual memory is configured too low, or is the use of the memory too high. WaIISHost.exe virtual memory use definitely looks quite high, but it seems to stabilize at around 4.1 GB.

I do not know why WaIISHost.exe would need so much, since the Run() method is doing just very light housekeeping activities, such as pinging the site every few minutes to keep the Application running during daytime.

Available Memory monitoring in Azure Portal shows the service to have 300-800MB of free memory during the whole period before the crash.

Any ideas what is the problem? How can I configure virtual memory higher?

azure
memory-leaks
virtual
asked on Stack Overflow May 8, 2013 by user1969169

1 Answer

8

You have a memory leak in your app code, what you are catching in the Azure exceptions is just the after effect of your environment running out of memory because of app code consuming all of it, from my experience.

Normally when I would RDP into the box identify the process that has the memory leak and then start a profiling session of your code with Redgate tool (or any other tool). Then run your process for half a day with that tool and you should easily identify what the problem is.

You will probably find I would guess that is something like an exception that is causing your code to not clean up correctly. That would explain why you would see this problem after a couple of days and couple of exceptions. I would look at your exception handling and clean up code. You should test for disconnects in your code base and other transient positions that might effect your code.

I would also stick in something like newrelic or foglight so that you get a clear heads up that you code/Azure instance is about to fail you can also see what timescales are on your failures and possible get more logging info and if there is anything that is causing the issues you are experiencing.

answered on Stack Overflow May 8, 2013 by JamesKn

User contributions licensed under CC BY-SA 3.0