How to diagnose a corrupted suffix pattern in a mixed managed/unmanaged x32 .NET application

2

I've got a .NET application that pinvokes several libraries, all 32 bit (the application is 32 bit as well). I recently started getting crash bugs that occurred when the GC started freeing memory, and when I attached I saw that it was an access violation. After some web searches, I got myself set up with gflags and windbg, and was able to get the actual problem :

===========================================================
VERIFIER STOP 0000000F: pid 0x9650: corrupted suffix pattern 

001B1000 : Heap handle
20A5F008 : Heap block
00000006 : Block size
20A5F00E : corruption address
===========================================================
This verifier stop is not continuable. Process will be terminated 
when you use the `go' debugger command.
===========================================================

After doing some more reading, I was able to get a stack trace :

0:009> !heap -p -a 20A5F008
address 20a5f008 found in
_HEAP @ f420000
  HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
    20a5efe0 0008 0000  [00]   20a5f008    00006 - (busy)
    Trace: 0a94
    60cba6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7
    60cb8f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e
    77e00d96 ntdll!RtlDebugAllocateHeap+0x00000030
    77dbaf0d ntdll!RtlpAllocateHeap+0x000000c4
    77d63cfe ntdll!RtlAllocateHeap+0x0000023a
    60cccb62 verifier!AVrfpRtlAllocateHeap+0x00000092
    7666ea43 ole32!CRetailMalloc_Alloc+0x00000016
    7666ea5f ole32!CoTaskMemAlloc+0x00000013
    6c40b25d clr!MngdNativeArrayMarshaler::ConvertSpaceToNative+0x000000bd

... and some more detailed information on the block entry :

0:009> !heap -i 20a5f008
Detailed information for block entry 20a5f008
Assumed heap       : 0x0f610000 (Use !heap -i NewHeapHandle to change)
Header content     : 0x00000000 0x00000001
Owning segment     : 0x0f610000 (offset 0)
Block flags        : 0x0 (free )
Total block size   : 0x0 units (0x0 bytes)
Previous block size: 0xb4e4 units (0x5a720 bytes)
Block CRC          : OK - 0x0  
List corrupted: (Blink->Flink = 00000000) != (Block = 20a5f010)
Free list entry    : CORRUPTED
Previous block     : 0x20a048e8
Next block         : 0x20a5f008

I'm kind of stuck with this data. Unfortunately, ConvertSpaceToNative isn't an illuminating call, since that encompasses... pretty much every unmanaged allocation request. I've tried branching out further to find the information I'd need to trace it back to the offending call and spent days looking through documentation, but am not finding a way to determine the actual source of the corruption. I've tried setting break points and stepping through, but I can't find a way to verify the contents of the heap manually that actually works - it always reports that everything is okay. It also seems to me that I should be able to get the application to halt immediately by turning on full page heaps, but it still looks like it's not halting until the free call (this is the call stack when execution halts) :

0:009> kL
ChildEBP RetAddr  
2354ecac 60cb9df2 verifier!VerifierStopMessage+0x1f8
2354ed10 60cba22a verifier!AVrfpDphReportCorruptedBlock+0x1c2
2354ed6c 60cba742 verifier!AVrfpDphCheckNormalHeapBlock+0x11a
2354ed8c 60cb90d3 verifier!AVrfpDphNormalHeapFree+0x22
2354edb0 77e01564 verifier!AVrfDebugPageHeapFree+0xe3
2354edf8 77dbac29 ntdll!RtlDebugFreeHeap+0x2f
2354eeec 77d634a2 ntdll!RtlpFreeHeap+0x5d
2354ef0c 60cccc4f ntdll!RtlFreeHeap+0x142
2354ef54 76676e6a verifier!AVrfpRtlFreeHeap+0x86
2354ef68 76676f54 ole32!CRetailMalloc_Free+0x1c
2354ef78 6c40b346 ole32!CoTaskMemFree+0x13
2354f008 231f7e8a clr!MngdNativeArrayMarshaler::ClearNative+0x78
WARNING: Frame IP not in any known module. Following frames may be wrong.
2354f08c 231f6442 0x231f7e8a
2354f154 231f5a7b 0x231f6442
2354f264 231f572b 0x231f5a7b
2354f288 231f56a4 0x231f572b
2354f2a4 231f7b3e 0x231f56a4
2354f330 231f207b 0x231f7b3e
2354f3b4 1edf60e0 0x231f207b
*** WARNING: Unable to verify checksum for     C:\Windows\assembly\NativeImages_v4.0.30319_32\mscorlib\045c9588954c3662d542b53f4462268b\mscorlib.ni.dll
2354f850 6a746ed4 0x1edf60e0
2354f85c 6a724157 mscorlib_ni+0x386ed4
2354f8c0 6a724096 mscorlib_ni+0x364157
2354f8d4 6a724051 mscorlib_ni+0x364096
2354f8f0 6a691cd2 mscorlib_ni+0x364051
2354f908 6c353e22 mscorlib_ni+0x2d1cd2
2354f914 6c363355 clr!CallDescrWorkerInternal+0x34
2354f968 6c366d1f clr!CallDescrWorkerWithHandler+0x6b
2354f9e0 6c4d29d6 clr!MethodDescCallSite::CallTargetWorker+0x152
2354fb54 6c3c8357 clr!ThreadNative::KickOffThread_Worker+0x19d
2354fb68 6c3c83c5 clr!Thread::DoExtraWorkForFinalizer+0x1ca
2354fc10 6c3c8492 clr!Thread::DoExtraWorkForFinalizer+0x256
2354fc6c 6c3c84ff clr!Thread::DoExtraWorkForFinalizer+0x615
2354fc90 6c4d2ad8 clr!Thread::DoExtraWorkForFinalizer+0x6b2
2354fd14 6c3fb4ad clr!ThreadNative::KickOffThread+0x1d2
2354feb0 60cd11d3 clr!Thread::intermediateThreadProc+0x4d
2354fee8 75c6336a verifier!AVrfpStandardThreadFunction+0x2f
2354fef4 77d69f72 KERNEL32!BaseThreadInitThunk+0xe
2354ff34 77d69f45 ntdll!__RtlUserThreadStart+0x70
2354ff4c 00000000 ntdll!_RtlUserThreadStart+0x1b

I feel like it's supposed to be obvious what I ought to be doing now, but no avenue of investigation is turning up anything to move me towards resolving the bug.

c#
pinvoke
windbg
unmanaged
asked on Stack Overflow Feb 12, 2014 by ianschol

1 Answer

1

I finally resolved this, and found that I had made one crucial mistake. With a corrupted suffix pattern, the error will come on a free attempt, which led me to believe that it was unlikely that the allocation would have come right before the free. This was not accurate. When dealing with corruption that occurs on free, barring further information, any allocation point is equally likely. In this case, the verifier halt was coming on freeing a parameter which had been incorrectly defined as a struct of shorts instead of as a struct of ints.

Here's the offending code:

    [DllImport("gdi32.dll", CharSet = CharSet.Unicode)]
    [return: MarshalAs(UnmanagedType.Bool)]
    static extern bool GetCharABCWidths(IntPtr hdc, uint uFirstChar, uint uLastChar, [Out] ABC[] lpabc);

(This declaration is okay)

    [StructLayout(LayoutKind.Sequential)]
    public struct ABC
    {
        public short A;
        public ushort B;
        public short C;
    }

(This is not okay, per the MSDN article on the ABC struct : http://msdn.microsoft.com/en-us/library/windows/desktop/dd162454(v=vs.85).aspx )

So, if you find yourself debugging memory corruption that halts on free, keep in mind: never discount the possibility that the memory being freed was incorrectly allocated to begin with... and mind those [Out] parameters on unmanaged calls!

answered on Stack Overflow Feb 14, 2014 by ianschol

User contributions licensed under CC BY-SA 3.0