Record instructions stepping/tracing in a multithreaded application on a multi-core system

1

I created an extension for WinDbg using DbgEng, which records trace containing the state of the registers and stack for each executed instruction in user mode and in kernel mode.
For step-tracing, I use breakpoints (I also tried sets the TRAP flag to threads context for which a trace is recorded, but it gave the same result). After each step and hitting a breakpoint IDebugEventCallbacks::Breakpoint() is called, which returns DEBUG_STATUS_BREAK. This causes the debugger to complete the IDebugControl::WaitForEvent() function in the engine loop and I record the state for the current instruction.
The trace is recorded in a test multithreaded application, on a multi-core system (4-physical, 8-virtual cores). The test application creates 6 threads apart from the main one. 4 threads record trace of the same function func0 asynchronously. The other 3 created threads simulate calculations.
Code of the test program:

void func3();
void func2();
void func1();
DWORD WINAPI func0(LPVOID lpArg);
DWORD WINAPI LoopProcessing(LPVOID lpArg);

void func3()
{
    int a = 1;
    a += 2;
    a *= 60;
    a += 31;
    printf(u8"func3\n");
    a /= 2;
    a *= 777;
    a -= 31;
}

void func2()
{
    int a = 1;
    a += 5;
    a *= 41;
    a += 55;
    printf(u8"func2\n");
    a /= 3;
    a *= 414;
    a -= 41;
}

void func1()
{
    int a = 1;
    a += 95;
    a *= 14;
    a += 72;
    printf(u8"func1\n");
    a /= 1;
    a *= 42;
    a -= 37;
}

DWORD WINAPI func0(LPVOID lpArg)
{
    func1();
    func2();
    func3();

    return 0;
}

DWORD WINAPI LoopProcessing(LPVOID lpArg)
{
    int a = 1,
        b = 2,
        c = 3;

    b = 2 * c + 1;
    c = a + b * c;
    a = b * c + a;
    SwitchToThread();

    return 0;
}

int main(int argc, char* argv[])
{
    const size_t szThreadsCount = 6;
    HANDLE hThreads[szThreadsCount];
    
    for (int iIndexIteration = 0; iIndexIteration < 15; iIndexIteration++)
    {
        printf("====================== Iteration %i start ======================\n", iIndexIteration);

        hThreads[0] = CreateThread(NULL, NULL, func0, NULL, NULL, NULL);
        hThreads[1] = CreateThread(NULL, NULL, LoopProcessing, NULL, NULL, NULL);
        hThreads[2] = CreateThread(NULL, NULL, func0, NULL, NULL, NULL);
        hThreads[3] = CreateThread(NULL, NULL, LoopProcessing, NULL, NULL, NULL);
        hThreads[4] = CreateThread(NULL, NULL, func0, NULL, NULL, NULL);
        hThreads[5] = CreateThread(NULL, NULL, LoopProcessing, NULL, NULL, NULL);

        for (int iIndex = 0; iIndex < 3; iIndex++)
        {
            func0(0);
        }

        WaitForMultipleObjects(szThreadsCount, hThreads, TRUE, INFINITE);

        for (int iIndex = 0; iIndex < szThreadsCount; iIndex++)
        {
            CloseHandle(hThreads[iIndex]);
            hThreads[iIndex] = nullptr;
        }
    }

    return 0;
}

In most cases, everything works as expected, and I get a fully recorded trace like this:

…
036|00007ff6`ad314215 call    Saek0!ILT+550(__CheckForDebuggerJustMyCode) (00007ff6`ad30122b)
Registers:
rax: 0x00000000CCCCCCCC rbx: 0x0000000000000000 rcx: 0x00007FF6AD33208C
rdx: 0x00007FF6AD30155F rsi: 0x0000000000000000 rdi: 0x0000000000A2FE18
rip: 0x00007FF6AD314215 rsp: 0x0000000000A2FD10 rbp: 0x0000000000A2FD30
r8:  0x0000000000000001 r9:  0x00007FF6AD30155F r10: 0x0000000000000000
r11: 0x0000000000000000 r12: 0x0000000000000000 r13: 0x0000000000000000
r14: 0x0000000000000000 r15: 0x0000000000000000
EFlags: 0xCCCCCCCC00000302
 
 
037|00007ff6`ad31421a mov     dword ptr [rbp+4],1     ss:00000000`00000004=????????
Registers:
rax: 0x0000000000000001 rbx: 0x0000000000000000 rcx: 0x00007FF6AD33208C
rdx: 0x00007FF6AD30155F rsi: 0x0000000000000000 rdi: 0x0000000000A2FE18
rip: 0x00007FF6AD31421A rsp: 0x0000000000A2FD10 rbp: 0x0000000000A2FD30
r8:  0x0000000000000001 r9:  0x00007FF6AD30155F r10: 0x0000000000000000
r11: 0x0000000000000000 r12: 0x0000000000000000 r13: 0x0000000000000000
r14: 0x0000000000000000 r15: 0x0000000000000000
EFlags: 0xCCCCCCCC00000202
 
 
038|00007ff6`ad314221 mov     eax,dword ptr [rbp+4]   ss:00000000`00000004=????????
Registers:
rax: 0x0000000000000001 rbx: 0x0000000000000000 rcx: 0x00007FF6AD33208C
rdx: 0x00007FF6AD30155F rsi: 0x0000000000000000 rdi: 0x0000000000A2FE18
rip: 0x00007FF6AD314221 rsp: 0x0000000000A2FD10 rbp: 0x0000000000A2FD30
r8:  0x0000000000000001 r9:  0x00007FF6AD30155F r10: 0x0000000000000000
r11: 0x0000000000000000 r12: 0x0000000000000000 r13: 0x0000000000000000
r14: 0x0000000000000000 r15: 0x0000000000000000
EFlags: 0xCCCCCCCC00000302
…

But sometimes the debugger skips steps and doesn't trigger breakpoints.
For example:

00007ff6`ad314215 call    Saek0!ILT+550(__CheckForDebuggerJustMyCode) (00007ff6`ad30122b)
00007ff6`ad31421a mov     dword ptr [rbp+4],1
00007ff6`ad314221 mov     eax,dword ptr [rbp+4]

Suppose the state rip = 00007ff6'ad314215 has been recorded and a breakpoint is set at the following address 00007ff6'ad31421a. After that, I continue the execution of the program and expect for the breakpoint at address 00007ff6'ad31421a to be triggered. But instead of the expected hit to breakpoint at 00007ff6'ad31421a and call IDebugEventCallbacks::Breakpoint(), my current rip = 00007ff6'ad314221 and I get a call to IDebugEventCallbacks::Exception() with the following parameters:

Exception code: (0x80000003) == EXCEPTION_BREAKPOINT
Exception address: 0x7FF6AD314221 - address of the next instruction after the breakpoint.
threadContext.Rip: 0x7FF6AD314221 - instruction address of current thread context.

So the system has already executed the instruction at 00007ff6'ad31421a, but for some reason the debugger didn't hit the set breakpoint and raised an exception on the next instruction(00007ff6`ad314221) where any breakpoint was not set.

I also tried checking the program counter (threadContext.rip) in IDebugEventCallbacks::ChangeEngineState() for the thread whose execution is being recorded, but I never got a counter there equal to the address of the instruction that was skipped (00007ff6'ad31421a).

As a result, there is no state for address 00007ff6'ad31421a in my trace:

…
036|00007ff6`ad314215 call    Saek0!ILT+550(__CheckForDebuggerJustMyCode) (00007ff6`ad30122b)
Registers:
rax: 0x00000000CCCCCCCC rbx: 0x0000000000000000 rcx: 0x00007FF6AD33208C
rdx: 0x00007FF6AD30155F rsi: 0x0000000000000000 rdi: 0x0000000000A2FE18
rip: 0x00007FF6AD314215 rsp: 0x0000000000A2FD10 rbp: 0x0000000000A2FD30
r8:  0x0000000000000001 r9:  0x00007FF6AD30155F r10: 0x0000000000000000
r11: 0x0000000000000000 r12: 0x0000000000000000 r13: 0x0000000000000000
r14: 0x0000000000000000 r15: 0x0000000000000000
EFlags: 0xCCCCCCCC00000302
 
 
037|00007ff6`ad314221 mov     eax,dword ptr [rbp+4]   ss:00000000`00000004=????????
Registers:
rax: 0x0000000000000001 rbx: 0x0000000000000000 rcx: 0x00007FF6AD33208C
rdx: 0x00007FF6AD30155F rsi: 0x0000000000000000 rdi: 0x0000000000A2FE18
rip: 0x00007FF6AD314221 rsp: 0x0000000000A2FD10 rbp: 0x0000000000A2FD30
r8:  0x0000000000000001 r9:  0x00007FF6AD30155F r10: 0x0000000000000000
r11: 0x0000000000000000 r12: 0x0000000000000000 r13: 0x0000000000000000
r14: 0x0000000000000000 r15: 0x0000000000000000
EFlags: 0xCCCCCCCC00000302
…

Questions:

  1. What can be the reason for this behavior?
  2. I have met the opinion that this behavior can be caused by execution on a multi-core system, but the author of this judgment didn't refer to any sources where this was somehow justified. If this is indeed the case, can someone provide links to the rationale?
  3. Any suggestions to solve the problem?
  4. Are there tools that allow you to record a similar trace with states in Windows as in user mode and in kernel mode?
multithreading
debugging
windbg
trace
dbgeng
asked on Stack Overflow Nov 20, 2020 by M3ik Shizuka

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0