StackOverflowException with Lazy<T> when throwing Exception

9

A very simple sample application (.NET 4.6.2) produces a StackOverflowException at a recursion depth of 12737, which reduces to a recursion depth of 10243, if the most inner function call throws an exception, which is expected and OK.

If i use a Lazy<T> to shortly hold an intermediate result, the StackOverflowException already occurs at a recursion depth of 2207, if no exception is thrown and at a recursion depth of 105, if an exception is thrown.

Note: The StackOverflowException at a depth of 105 is only observable if compiled to x64. With x86 (32-Bit), the effect first occurs at a depth of 4272. Mono (like used at https://repl.it) works without problems up to a depth of 74200.

The StackOverflowException does not occur within the deep recursion, but while ascending back to the main routine. The finally block is processed up some depth, then the program dies:

Exception System.InvalidOperationException at 105
Finally at 105
...
Exception System.InvalidOperationException at 55
Finally at 55
Exception System.InvalidOperationException at 54
Finally at 54
Process is terminated due to StackOverflowException.

or within the debugger:

The program '[xxxxx] Test.vshost.exe' has exited with code -2147023895 (0x800703e9).

Who might be able to explain this?

public class Program
{
    private class Test
    {
        private int maxDepth;

        private int CalculateWithLazy(int depth)
        {
            try
            {
                var lazy = new Lazy<int>(() => this.Calculate(depth));
                return lazy.Value;
            }  
            catch (Exception e)
            {
                Console.WriteLine("Exception " + e.GetType() + " at " + depth);
                throw;
            }
            finally
            {
                Console.WriteLine("Finally at " + depth);
            }
        }

        private int Calculate(int depth)
        {
            if (depth >= this.maxDepth) throw new InvalidOperationException("Max. recursion depth reached.");
            return this.CalculateWithLazy(depth + 1);
        }

        public void Run()
        {
            for (int i = 1; i < 100000; i++)
            {
                this.maxDepth = i;

                try
                {
                    Console.WriteLine("MaxDepth: " + i);
                    this.CalculateWithLazy(0);

                }
                catch { /* ignore */ }
            }
        }
    }

    public static void Main(string[] args)
    {
        var test = new Test();
        test.Run();
        Console.Read();
    }

Update: The problem can be reproduced without the usage of Lazy<T>, just by having a try-catch-throw block in the recursive method.

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int Calculate(int depth)
        {
            try
            {
                if (depth >= this.maxDepth) throw new InvalidOperationException("Max. recursion depth reached.");
                return this.Calculate2(depth + 1);
            }
            catch
            {
                throw;
            }
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int Calculate2(int depth) // just to prevent the compiler from tail-recursion-optimization
        {
            return this.Calculate(depth);
        }

        public void Run()
        {
            for (int i = 1; i < 100000; i++)
            {
                this.maxDepth = i;

                try
                {
                    Console.WriteLine("MaxDepth: " + i);
                    this.Calculate(0);

                }
                catch(Exception e)
                {
                    Console.WriteLine("Finished with " + e.GetType());
                }
            }
        }
c#
asked on Stack Overflow Nov 18, 2017 by Rauhotz • edited Nov 19, 2017 by Rauhotz

3 Answers

8

The problem can be reproduced without the usage of Lazy<T>, just by having a try-catch-throw block in the recursive method.

You've already noticed the source of the behavior. Now let me explain why, as this makes no sense, right?

It makes no sense, because the exception is catched and then immediately rethrown, so the stack should shrink, right?

The following test:

internal class Program
{
    private int _maxDepth;

    [MethodImpl(MethodImplOptions.NoInlining)]
    private int Calculate(int depth)
    {
        try
        {
            Console.WriteLine("In try at depth {0}: stack frame count = {1}", depth, new StackTrace().FrameCount);

            if (depth >= _maxDepth)
                throw new InvalidOperationException("Max. recursion depth reached.");

            return Calculate2(depth + 1);
        }
        catch
        {
            Console.WriteLine("In catch at depth {0}: stack frame count = {1}", depth, new StackTrace().FrameCount);
            throw;
        }
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private int Calculate2(int depth) => Calculate(depth);

    public void Run()
    {
        try
        {
            _maxDepth = 10;
            Calculate(0);
        }
        catch (Exception e)
        {
            Console.WriteLine("Finished with " + e.GetType());
        }
    }

    public static void Main() => new Program().Run();
}

Seems to validate that hypothesis:

In try at depth 0: stack frame count = 3
In try at depth 1: stack frame count = 5
In try at depth 2: stack frame count = 7
In try at depth 3: stack frame count = 9
In try at depth 4: stack frame count = 11
In try at depth 5: stack frame count = 13
In try at depth 6: stack frame count = 15
In try at depth 7: stack frame count = 17
In try at depth 8: stack frame count = 19
In try at depth 9: stack frame count = 21
In try at depth 10: stack frame count = 23
In catch at depth 10: stack frame count = 23
In catch at depth 9: stack frame count = 21
In catch at depth 8: stack frame count = 19
In catch at depth 7: stack frame count = 17
In catch at depth 6: stack frame count = 15
In catch at depth 5: stack frame count = 13
In catch at depth 4: stack frame count = 11
In catch at depth 3: stack frame count = 9
In catch at depth 2: stack frame count = 7
In catch at depth 1: stack frame count = 5
In catch at depth 0: stack frame count = 3
Finished with System.InvalidOperationException

Well... no. It's not that simple.


.NET exceptions are built on top of Windows Structured Exception Handling (SEH), which is a complicated beast.

There are two articles you need to read if you want to know the details. They're old, but the parts relevant to your question are still accurate:

But let's concentrate on the matter at hand, which is stack unwinding.

Here's what the first one says what happens when the stack is unwound (emphasis mine):

The other form of unwind is the actual popping of the CPU stack. This doesn’t happen as eagerly as the popping of the SEH records. On X86, EBP is used as the frame pointer for methods containing SEH. ESP points to the top of the stack, as always. Until the stack is actually unwound, all the handlers are executed on top of the faulting exception frame. So the stack actually grows when a handler is called for the first or second pass. EBP is set to the frame of the method containing a filter or finally clause so that local variables of that method will be in scope.

The actual popping of the stack doesn’t occur until the catching ‘except’ clause is executed.

Let's modify our previous test program to check this:

internal class Program
{
    private int _maxDepth;
    private UIntPtr _stackStart;

    [MethodImpl(MethodImplOptions.NoInlining)]
    private int Calculate(int depth)
    {
        try
        {
            Console.WriteLine("In try at depth {0}: stack frame count = {1}, stack offset = {2}",depth, new StackTrace().FrameCount, GetLooseStackOffset());

            if (depth >= _maxDepth)
                throw new InvalidOperationException("Max. recursion depth reached.");

            return Calculate2(depth + 1);
        }
        catch
        {
            Console.WriteLine("In catch at depth {0}: stack frame count = {1}, stack offset = {2}", depth, new StackTrace().FrameCount, GetLooseStackOffset());
            throw;
        }
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private int Calculate2(int depth) => Calculate(depth);

    public void Run()
    {
        try
        {
            _stackStart = GetSomePointerNearTheTopOfTheStack();
            _maxDepth = 10;
            Calculate(0);
        }
        catch (Exception e)
        {
            Console.WriteLine("Finished with " + e.GetType());
        }
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static unsafe UIntPtr GetSomePointerNearTheTopOfTheStack()
    {
        int dummy;
        return new UIntPtr(&dummy);
    }

    private int GetLooseStackOffset() => (int)((ulong)_stackStart - (ulong)GetSomePointerNearTheTopOfTheStack());

    public static void Main() => new Program().Run();
}

Here's the result:

In try at depth 0: stack frame count = 3, stack offset = 384
In try at depth 1: stack frame count = 5, stack offset = 752
In try at depth 2: stack frame count = 7, stack offset = 1120
In try at depth 3: stack frame count = 9, stack offset = 1488
In try at depth 4: stack frame count = 11, stack offset = 1856
In try at depth 5: stack frame count = 13, stack offset = 2224
In try at depth 6: stack frame count = 15, stack offset = 2592
In try at depth 7: stack frame count = 17, stack offset = 2960
In try at depth 8: stack frame count = 19, stack offset = 3328
In try at depth 9: stack frame count = 21, stack offset = 3696
In try at depth 10: stack frame count = 23, stack offset = 4064
In catch at depth 10: stack frame count = 23, stack offset = 13024
In catch at depth 9: stack frame count = 21, stack offset = 21888
In catch at depth 8: stack frame count = 19, stack offset = 30752
In catch at depth 7: stack frame count = 17, stack offset = 39616
In catch at depth 6: stack frame count = 15, stack offset = 48480
In catch at depth 5: stack frame count = 13, stack offset = 57344
In catch at depth 4: stack frame count = 11, stack offset = 66208
In catch at depth 3: stack frame count = 9, stack offset = 75072
In catch at depth 2: stack frame count = 7, stack offset = 83936
In catch at depth 1: stack frame count = 5, stack offset = 92800
In catch at depth 0: stack frame count = 3, stack offset = 101664
Finished with System.InvalidOperationException

Oops. Yes, the stack actually grows while we're handling the exceptions.

At _maxDepth = 1000, the execution ends at:

In catch at depth 933: stack frame count = 1869, stack offset = 971232
In catch at depth 932: stack frame count = 1867, stack offset = 980096
In catch at depth 931: stack frame count = 1865, stack offset = 988960
In catch at depth 930: stack frame count = 1863, stack offset = 997824

Process is terminated due to StackOverflowException.

So about 997824 bytes of used stack space by our own code, which is pretty close to the 1 MB default thread stack size on Windows. The calling CLR code should make up for the difference.


As you may know, SEH exceptions are handled in two passes:

  • The first pass (filtering) finds the first exception handler that is able to handle the exception. In C#, this basically checks whether the catch clause matches the correct exception type, and executes the when part of catch (...) when (...) if one is present.
  • The second pass (unwinding) actually handles the exception.

Here's what the second article says on the unwinding process:

When an exception occurs, the system walks the list of EXCEPTION_REGISTRATION structures until it finds a handler for the exception. Once a handler is found, the system walks the list again, up to the node that will handle the exception. During this second traversal, the system calls each handler function a second time. The key distinction is that in the second call, the value 2 is set in the exception flags. This value corresponds to EH_UNWINDING.

[...]

After an exception is handled and all the previous exception frames have been called to unwind, execution continues wherever the handling callback decides.

This only confirms what the first article said.

The first pass needs to have the faulting stack preserved to be able to inspect its state, and to be able to resume execution on the faulting instruction (yes, that's a thing - it's very low-level but you can patch the cause of the error and resume execution as if there was no error in the first place).

The second pass is implemented just like the first one, except the handlers now get the EH_UNWINDING flag. This means the faulting stack is still preserved at that point, until the final handler decides where to resume execution.


The stack pointer moves 36 bytes for a 32-Bit process, but whopping 8976 bytes for a 64-bit process here while unwinding the stack. What's the explanation for this?

Good question!

That's because 32-bit and 64-bit SEH are completely different. Here's some reading material (emphasis mine):

Because on the x86 each function that uses SEH has this aforementioned construct as part of its prolog, the x86 is said to use frame based exception handling. There are a couple of problems with this approach:

  • Because the exception information is stored on the stack, it is susceptible to buffer overflow attacks.
  • Overhead. Exceptions are, well, exceptional, which means the exception will not occur in the common case. Regardless, every time a function is entered that uses SEH, these extra instructions are executed.

Because the x64 was a chance to do away with a lot of the cruft that had been hanging around for decades, SEH got an overhaul that addressed both issues mentioned above. On the x64, SEH has become table-based, which means when the source code is compiled, a table is created that fully describes all the exception handling code within the module. This table is then stored as part of the PE header. If an exception occurs, the exception table is parsed by Windows to find the appropriate exception handler to execute. Because exception handling information is tucked safely away in the PE header, it is no longer susceptible to buffer overflow attacks. In addition, because the exception table is generated as part of the compilation process, no runtime overhead (in the form of push and pop instructions) is incurred during normal processing.

Of course, table-based exception handling schemes have a couple of negative aspects of their own. For example, table-based schemes tend to take more space in memory than stack-based schemes. Also, while overhead in the normal execution path is reduced, the overhead it takes to process an exception is significantly higher than in frame-based approaches. Like everything in life, there are trade-offs to consider when evaluating whether the table-based or a frame-based approach to exception handling is "best."

In short, the happy path has been optimized in x64, while the exceptional path has become slower. If you ask me, that's a very good thing.

Here's another citation from the first article I linked to earlier:

Both IA64 and AMD64 have a model for exception handling that avoids reliance on an explicit handler chain that starts in TLS and is threaded through the stack. Instead, exception handling relies on the fact that on 64-bit systems we can perfectly unwind a stack. And this ability is itself due to the fact that these chips are severely constrained on the calling conventions they support.

[...]

Anyway, on 64-bit systems the correspondence between an activation record on the stack and the exception record that applies to it is not achieved through an FS:[0] chain. Instead, unwinding of the stack reveals the code addresses that correspond to a particular activation record. These instruction pointers of the method are looked up in a table to find out whether there are any__try/__except/__finally clauses that cover these code addresses. This table also indicates how to proceed with the unwind by describing the actions of the method epilog.

So, yeah. Completely different approach.

But let's take a look at the x64 call stack to see where the stack space is actually used. I modified Calculate as follows:

private int Calculate(int depth)
{
    try
    {
        if (depth >= _maxDepth)
            throw new InvalidOperationException("Max. recursion depth reached.");

        return Calculate2(depth + 1);
    }
    catch
    {
        if (depth == _maxDepth)
        {
            Console.ReadLine();
        }

        throw;
    }
}

I put a breakpoint on Console.ReadLine(); and took a look at the native call stack along with the value of the stack pointer register on each frame:

native call stack

The addresses fffffffffffffffe and 0000000000008000 look very odd to me, but anyway this shows how much stack space each frame consumes. A lot of stuff is going on in the Windows Native API (ntdll.dll), and the CLR adds some.

We're out of luck as far as the internal Windows stuff goes, as the source code is not publicly available. But we can at least take a look at clr.dll!ClrUnwindEx, as that function is quite small but uses quite a lot of stack space:

void ClrUnwindEx(EXCEPTION_RECORD* pExceptionRecord, UINT_PTR ReturnValue, UINT_PTR TargetIP, UINT_PTR TargetFrameSp)
{
    PVOID TargetFrame = (PVOID)TargetFrameSp;

    CONTEXT ctx;
    RtlUnwindEx(TargetFrame,
                (PVOID)TargetIP,
                pExceptionRecord,
                (PVOID)ReturnValue, // ReturnValue
                &ctx,
                NULL);      // HistoryTable

    // doesn't return
    UNREACHABLE();
}

It defines a CONTEXT variable on the stack, which is a large struct. I can only hypothesize that the 64-bit SEH functions use their stack space for similar purposes.

Now let's compare this to the 32-bit call stack:

32-bit call stack

As you can see, it's not the same thing as in 64-bit at all.

Out of curiosity, I tested the behavior of a simple C++ program:

#include "stdafx.h"
#include <iostream>

__declspec(noinline) static char* GetSomePointerNearTheTopOfTheStack()
{
    char dummy;
    return &dummy;
}

int main()
{
    auto start = GetSomePointerNearTheTopOfTheStack();

    try
    {
        throw 42;
    }
    catch (...)
    {
        auto here = GetSomePointerNearTheTopOfTheStack();
        std::cout << "Difference in " << (sizeof(char*) * 8) << "-bit: " << (start - here) << std::endl;
    }

    return 0;
}

Here are the results:

Difference in 32-bit: 2224
Difference in 64-bit: 9744

Which further shows that it's not only a CLR thing, but it's due to the underlying SEH implementation difference.

answered on Stack Overflow Nov 26, 2017 by Lucas Trzesniewski • edited Jun 20, 2020 by Community
1

There are two reasons:

  1. Lazy uses more memory on the stack (it's just one more variable)
  2. Lazy adds more stack frames (since it invokes a delegate)

Read below for more details:

The amount of memory that is allocated to your call stack is fixed (this is the maxStackSize of your Thread)

So, the number of stack frames that are going to fit into that fixed amount of memory depends on the size of these stack frames.

If you are using additional variables inside your method, they have to be written to the stack, and they take up memory.

Also, the amount of stack frames will vary if you use a Lazy<T> because it holds a delegate which needs one more invocation (one more stack frame that you don't count)

This is exactly the case that you are encountering, if you use additional lazy variable inside CalculateWithLazy your stack frame just takes more space, that is why you get less stack frames before the program fails with StackOverflowException

It is possible to calculate this with more precision, but I think this approximate explanation is enough to understand the reason for different behavior.

Here is how you can find out what is maxStackSize of your thread: How to find current thread's max stack size in .net?

Here is how you can find out the size of the reference type variables (depends on platform + some overhead): How much memory does a C# reference consume?

Lastly, you only have System.Int32 in your code, so it takes 32 bytes of memory.
If you had any custom structs (value types) calculating their size would be quite a challenge, see the answer from @Hans Passant in this question: How do I check the number of bytes consumed by a structure?

answered on Stack Overflow Nov 18, 2017 by ironstone13 • edited Nov 18, 2017 by ironstone13
0

By using Lazy you are adding more calls: the call to the Value property, probably Invoke on the delegate and possibly more depending how it's implemented. Your debugger call stack could help visualize what's going on.

answered on Stack Overflow Nov 18, 2017 by Asik

User contributions licensed under CC BY-SA 3.0