Random crashes on Windows 10 64bit with ATL subclassing

12

Just from the start: Since March 1st 2017 this is a bug confirmed by Microsoft. Read comments at the end.

Short description:

I have random crashes in larger application using MFC, ATL. In all such cases after ATL subclassing was used for a window upon simple actions with a window (moving, resizing, setting the focus, painting etc.) I get a crash on a random execution address.

First it looked like a wild pointer or heap corruption but I narrowed the complete scenario down to a very simple application using pure ATL and only Windows API.

Requirements / my used scenarios:

  • The application was created with VS 2015 Enterprise Update 3.
  • The program should be compiled as 32bit.
  • Test application uses CRT as a shared DLL.
  • The application runs under Windows 10 Build 14393.693 64bit (but we have repros under Windows 8.1 and Windows Server 2012 R2, all 64bit)
  • atlthunk.dll has version 10.0.14393.0

What the application does:

It simply creates a frame window and tries to create many static windows with the windows API. After the static window is created, this window is subclassed with the ATL CWindowImpl::SubclassWindow method. After the subclass operation a simple window message is sent.

What happens:

Not on every run, but very often the application crashes upon SendMessage to the subclassed window. On the 257 window ( or another multiple of 256+1) the subclass fails in some way. The ATL thunk that is created is invalid. It seems that the stored execution address of the new subclass-function isn't correct. Sending any the message to the window causes a crash. The callstack is always the same. The last visible and known address in the callstack is in the atlthunk.dll

atlthunk.dll!AtlThunk_Call(unsigned int,unsigned int,unsigned int,long) Unknown
atlthunk.dll!AtlThunk_0x00(struct HWND__ *,unsigned int,unsigned int,long)  Unknown
user32.dll!__InternalCallWinProc@20()   Unknown
user32.dll!UserCallWinProcCheckWow()    Unknown
user32.dll!SendMessageWorker()  Unknown
user32.dll!SendMessageW()   Unknown
CrashAtlThunk.exe!WindowCheck() Line 52 C++

The thrown exception in the debugger is shown as:

Exception thrown at 0x0BF67000 in CrashAtlThunk.exe: 
0xC0000005: Access violation executing location 0x0BF67000.

or another sample

Exception thrown at 0x2D75E06D in CrashAtlThunk.exe: 
0xC0000005: Access violation executing location 0x2D75E06D.

What I know about atlthunk.dll:

Atlthunk.dll seems to be only part of 64bit OS. I found it on a Win 8.1 and Win 10 systems.

If atlthunk.dll is available (all Windows 10 machines), this DLL cares about the thunking. If the DLL isn't present, thunking is done in the standard way: allocating a block on the heap, marking it as executable, adding some load and a jump statement.

If the DLL is present. It contains 256 predefined slots for subclassing. If 256 subclasses are done, the DLL reloads itself a second time into memory and uses the next 256 available slots in the DLL.

As far as I see, the atlthunk.dll belongs to the Windows 10 and isn't exchangeable or redistributable.

Things checked:

  • Antivirus system was turned of or on, no change
  • Data execution protection doesn't matter. (/NXCOMPAT:NO and the EXE is defined as an exclusion in the system settings, crashes too)
  • Additional calls to FlushInstructionCache or Sleep calls after the subclass doesn't have any effect.
  • Heap integrity isn't a problem here, I rechecked it with more than one tool.
  • and a thousands more (I may already forgot what I tested)... ;)

Reproducibility:

The problem is somehow reproducible. It doesn't crashes all the time, it crashes randomly. I have a machine were the code crashes on every third execution.

I can repro it on two desktop stations with i7-4770 and a i7-6700.

Other machines seem not to be affected at all (works always on a Laptop i3-3217, or desktop with i7-870)

About the sample:

For simplicity I use a SEH handler to catch the error. If you debug the application the debugger will show the callstack mentioned above. The program can be launched with an integer on the command line.In this case the program launches itself again with the count decremented by 1.So if you launch CrashAtlThunk 100 it will launch the application 100 times. Upon an error the SEH handler will catch the error and shows the text "Crash" in a message box. If the application runs without errors, the application shows "Succeeded" in a message box. If the application is started without a parameter it is just executed once.

Questions:

  • Does anybody else can repro this?
  • Does anybody saw similar effects?
  • Does anybody know or can imagine a reason for this?
  • Does anybody know how to get around this problem?

Notes:

2017-01-20 Support case at Microsoft opened.

The code

// CrashAtlThunk.cpp : Defines the entry point for the application.
//

// Windows Header Files:
#include <windows.h>

// C RunTime Header Files
#include <stdlib.h>
#include <malloc.h>
#include <memory.h>
#include <tchar.h>

#define _ATL_CSTRING_EXPLICIT_CONSTRUCTORS      // some CString constructors will be explicit

#include <atlbase.h>
#include <atlstr.h>
#include <atlwin.h>


// Global Variables:
HINSTANCE hInst;                                // current instance

const int NUM_WINDOWS = 1000;

//------------------------------------------------------
//    The problematic code
//        After the 256th subclass the application randomly crashes.

class CMyWindow : public CWindowImpl<CMyWindow>
{
public:
    virtual BOOL ProcessWindowMessage(_In_ HWND hWnd, _In_ UINT uMsg, _In_ WPARAM wParam, _In_ LPARAM lParam, _Inout_ LRESULT& lResult, _In_ DWORD dwMsgMapID) override
    {
        return FALSE;
    }
};

void WindowCheck()
{
    HWND ahwnd[NUM_WINDOWS];
    CMyWindow subclass[_countof(ahwnd)];

    HWND hwndFrame;
    ATLVERIFY(hwndFrame = ::CreateWindow(_T("Static"), _T("Frame"), SS_SIMPLE, 0, 0, 10, 10, NULL, NULL, hInst, NULL));

    for (int i = 0; i<_countof(ahwnd); ++i)
    {
        ATLVERIFY(ahwnd[i] = ::CreateWindow(_T("Static"), _T("DummyWindow"), SS_SIMPLE|WS_CHILD, 0, 0, 10, 10, hwndFrame, NULL, hInst, NULL));
        if (ahwnd[i])
        {
            subclass[i].SubclassWindow(ahwnd[i]);
            ATLVERIFY(SendMessage(ahwnd[i], WM_GETTEXTLENGTH, 0, 0)!=0);
        }
    }
    for (int i = 0; i<_countof(ahwnd); ++i)
    {
        if (ahwnd[i])
            ::DestroyWindow(ahwnd[i]);
    }
    ::DestroyWindow(hwndFrame);
}
//------------------------------------------------------

int APIENTRY wWinMain(_In_ HINSTANCE hInstance,
                     _In_opt_ HINSTANCE hPrevInstance,
                     _In_ LPWSTR    lpCmdLine,
                     _In_ int       nCmdShow)
{
    hInst = hInstance; 

    int iCount = _tcstol(lpCmdLine, nullptr, 10);

    __try
    {
        WindowCheck();
        if (iCount==0)
        {
            ::MessageBox(NULL, _T("Succeeded"), _T("CrashAtlThunk"), MB_OK|MB_ICONINFORMATION);
        }
        else
        {
            TCHAR szFileName[_MAX_PATH];
            TCHAR szCount[16];
            _itot_s(--iCount, szCount, 10);
            ::GetModuleFileName(NULL, szFileName, _countof(szFileName));
            ::ShellExecute(NULL, _T("open"), szFileName, szCount, nullptr, SW_SHOW);
        }
    }
    __except (EXCEPTION_EXECUTE_HANDLER)
    {
        ::MessageBox(NULL, _T("Crash"), _T("CrashAtlThunk"), MB_OK|MB_ICONWARNING);
        return FALSE;
    }

    return 0;
}

Comment after answered by Eugene (Feb. 24th 2017):

I don't want to change my original question, but I want to add some additional information how to get this into a 100% Repro.

1, Change the main function to

int APIENTRY wWinMain(_In_ HINSTANCE hInstance,
                     _In_opt_ HINSTANCE hPrevInstance,
                     _In_ LPWSTR    lpCmdLine,
                     _In_ int       nCmdShow)
{
    // Get the load address of ATLTHUNK.DLL
    // HMODULE hMod = LoadLibrary(_T("atlThunk.dll"));

    // Now allocate a page at the prefered start address
    void* pMem = VirtualAlloc(reinterpret_cast<void*>(0x0f370000), 0x10000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    DWORD dwLastError = ::GetLastError();

    hInst = hInstance; 

    WindowCheck();

    return 0;
}
  1. Uncomment the LoadLibrary call. Compile.

  2. Run the programm once and stop in the debugger. Note the address where the library was loaded (hMod).

  3. Stop the program. Now comment the Library call again and change the VirtualAlloc call to the address of the previous hMod value, this is the prefered load address in this window session.

  4. Recompile and run. CRASH!

Thanks to eugene.

Up to now. Microsoft ist still investigating about this. They have dumps and all code. But I don't have a final answer. Fact is we have a fatal bug in some Windows 64bit OS.

I currently made the following changes to get around this

  1. Open atlstdthunk.h of VS-2015.

  2. Uncomment the #ifdef block completely that defines USE_ATL_THUNK2. Code lines 25 to 27.

  3. Recompile your program.

This enables the old thunking mechanism well known from VC-2010, VC-2013... and this works crash free for me. As long as there are no other already compiled libraries involved that may subclass or use 256 windows via ATL in any way.

Comment (Mar. 1st 2017):

  • Microsoft confirmed that this is a bug. It should be fixed in Windows 10 RS2.
  • Mircrosoft agrees that editing the headers in the atlstdthunk.h is a workaround for the problem.

In fact this says. As long as there is no stable patch I can never use the normal ATL thunking again, because I will never know what Window versions out in the world will use my program. Because Windows 8 and Windows 8.1 and Windows 10 prior to RS2 will suffer on this bug.

Final Comment (Mar. 9th 2017):

  • Builds with VS-2017 are affected too, there is no difference between VS-2015 and VS-2017
  • Microsoft decided that there will be no fix for older OS, regarding this case.
  • Neither Windows 8.1, Windows Server 2012 RC2 or other Windows 10 builds will get a patch to fix this issue.
  • The issue is to rare and the impact for our company is to small. Also the fix from our side is to simple. Other reports of this bug are not known.
  • The case is closed.

My advice for all programers: Change the the atlstdthunk.h in your Visual Studio version VS-2015, VS-2017 (see above). I don't understand Microsoft. This bug is a serious problem in the ATL thunking. It may hit every programmer that uses a greater number of windows and/or subclassing.

We only know of a fix in Windows 10 RS2. So all older OS are affected! So I recommend to disable the use of the atlthunk.dll by commenting out the define noted above.

c++
visual-studio-2015
windows-10
atl
visual-studio-2017
asked on Stack Overflow Jan 19, 2017 by xMRi • edited Mar 10, 2017 by xMRi

2 Answers

8

This is the bug inside atlthunk.dll. When it loads itself second time and further this happens manually via MapViewOfFile call. In this case not every address relative to the module base is properly changed (when DLL loaded by LoadLibarary/LoadLibraryEx calls system loader does this automatically). Then if the first time DLL was loaded on preferred base address everything works fine as unchanged addresses point to the similar code or data. But if not you got crash when 257th subclassed window handles messages.

Since Vista we have "address space layout randomization" feature this explains why your code crashes randomly. To have crash every time you have to discover atlthunk.dll base address on your OS (it differs on different OS versions) and do one memory page address space reservation at this address using VirtualAlloc call before the first subclass. To find the base address you can use dumpbin /headers atlthunk.dll command or parse PE headers manually.

My test shows that on Windows 10 build 14393.693 x32 version is affected but x64 is not. On Server 2012R2 with latest updates both (x32 and x64) versions are affected.

BTW, atlthunk.dll code has around 10 times more CPU instructions per thunk call as previous implementation. It may be not very significant but it slows down the message processing.

answered on Stack Overflow Feb 23, 2017 by Eugene • edited Feb 23, 2017 by Eugene
0

Slightly more automatic form of what was already described:

// A minimum ATL program with more than 256 windows. In practise they would not be toplevel, but e.g. buttons.
// Thanks to https://www.codeguru.com/cpp/com-tech/atl/article.php/c3605/Using-the-ATL-Windowing-Classes.htm
// for helping with ATL.
// You need to be up to date, like have KB3030947 or KB3061512. Otherwise asserts will fail instead.
#undef _DEBUG
#include <atlbase.h>
ATL::CComModule _Module;
#include <atlwin.h>
#include <assert.h>
#include <string>

BEGIN_OBJECT_MAP(ObjectMap) END_OBJECT_MAP()

struct CMyWindow : CWindowImpl<CMyWindow>
{
    BEGIN_MSG_MAP(CMyWindow) END_MSG_MAP()
};

int __cdecl wmain()
{
    // Exacerbate the problem, which can happen more like if by chance.
    PROCESS_INFORMATION process = { 0 };
    {
        // Be sure another process has atlthunk loaded.
        WCHAR cmd[] = L"rundll32 atlthunk,x";
        STARTUPINFOW startup = { sizeof(startup) };
        BOOL success = CreateProcessW(0, cmd, 0, 0, 0, 0, 0, 0, &startup, &process);
        assert(success && process.hProcess);
        CloseHandle(process.hThread);
        // Get atlthunk's usual address.
        HANDLE file = CreateFileW((std::wstring(_wgetenv(L"SystemRoot")) + L"\\system32\\atlthunk.dll").c_str(), GENERIC_READ,
            FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, 0, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0);
        assert(file != INVALID_HANDLE_VALUE);
        HANDLE mapping = CreateFileMappingW(file, 0, PAGE_READONLY | SEC_IMAGE, 0, 0, 0);
        assert(mapping);
        void* view = MapViewOfFile(mapping, 0, 0, 0, 0);
        assert(view);
        UnmapViewOfFile(view);
        VirtualAlloc(view, 1, MEM_COMMIT | MEM_RESERVE, PAGE_NOACCESS);
    }
    _Module.Init(0, 0);
    const int N = 300;
    CMyWindow wnd[N];
    for (int i = 0; i < N; ++i)
    {
        wnd[i].Create(0, CWindow::rcDefault, L"Hello", (i < N - 1) ? 0 : (WS_OVERLAPPEDWINDOW | WS_VISIBLE));
        wnd[i].DestroyWindow();
    }
    TerminateProcess(process.hProcess, 0);
    CloseHandle(process.hProcess);
    MSG msg;
    while (GetMessageW(&msg, 0, 0, 0))
    {
        TranslateMessage(&msg);
        DispatchMessageW(&msg);
    }
    _Module.Term();
}
answered on Stack Overflow Oct 6, 2020 by Jay K

User contributions licensed under CC BY-SA 3.0