UWP crash in ntdll.dll

2

I've got a UWP (C#) app that's running in production on a remote machine (under windows 10) but it periodically crashes. My client says, somewhat arbitrarily, every 9 hours or so.

I have several .wer files from the previous crashes but did not have a minidump, the paths referenced in the event viewer entry for the crash are blank other than the WER files. See edits below for how a minidump was obtained and findings.

The exception is an access violation (0xc0000005) at exception offset 0x0004df23 in ntdll.dll

I have the full source for the application and can run it in debug for long periods without the crash.

If I use DLL Export Viewer and load the exact version of ntdll.dll (copied from the remote machine) then I can see that at relative address 0x0004dc60 is EtwNotificationRegister and at 0x0004e260 is LdrGetDllPath.

Does this mean that my crash is occurring within a line of code in EtwNotificationRegister (which in turn is invoked by something within our code; however very difficult to trace without stack/minidump) I am not sure if the layout of a dll is such that the address I have can be placed like that?

Edit 2 as per @Raymond: No. There are almost certainly other non-exported functions between EtwNotificationRegister and LdrGetDllPath. On build 17763.475, offset 4df23 is RtlpWaitOnCriticalSection, so you are probably using an uninitialized critical section or an already-deleted critical section.

Is there any way I can extract more detail about this crash? I have remote access to the computer running the app but the crash does not appear to be triggered by a particular event (e.g. we can't hit a button and cause the crash)

Using a minidump now

I am running the program in both local debug as well. I have a remote debugger to the remote process but can't seem to break or inspect threads, not sure why. Just redeployed with symbols and the debugger attaches no problem but it just skips all breakpoints :(

Our own (rather naive) local log file, originally intended for only local debugging is written with a StreamWriter.WriteLine and immediately followed with a StreamWriter.Flush (wrapped in a try catch since that's not thread safe) just ends at a normal event on the remote machine - there is nothing following this normal event. We catch App_UnhandledException and write to this log so I'd have expected a stack here.

In Unexplained crashes related to ntdll.dll it is suggested that a crash from ntdll.dll is a canary in a coalmine Unexplained crashes related to ntdll.dll

Edit 1: I have configured an auto crash dump as per https://www.meziantou.net/2018/06/04/tip-automatically-create-a-crash-dump-file-on-error so if I can get it to crash again maybe I'll get a dump file next time?

Here is the detail from the WER

Version=1
EventType=MoAppCrash
EventTime=132017523132123596
ReportType=2
Consent=1
UploadTime=132017523137590717
ReportStatus=268435456
ReportIdentifier=8d467f04-4bdd-4f9e-bf26-b42d143ece1a
IntegratorReportIdentifier=b60f9ca0-4126-4262-a886-98d3844892d3
Wow64Host=34404
NsAppName=praid:App
OriginalFilename=XXXXXX.YYYYYY.exe
AppSessionGuid=00001514-0001-0004-9fe2-6df11905d501
TargetAppId=U:XXXXXX.YYYYYY_1.0.201.0_x64__b0abmt6f49vqj!App
TargetAppVer=1.0.201.0_x64_!2018//01//24:08:17:16!1194d!XXXXXX.YYYYYY.exe
BootId=4294967295
TargetAsId=1298
UserImpactVector=271582000
IsFatal=1
EtwNonCollectReason=4
Response.BucketId=2ee79f27e2e81a541d6200d746866340
Response.BucketTable=5
Response.LegacyBucketId=2117255699418735424
Response.type=4
Sig[0].Name=Package Full Name
Sig[0].Value=XXXXXX.YYYYYY_1.0.201.0_x64__b0abmt6f49vqj
Sig[1].Name=Application Name
Sig[1].Value=praid:App
Sig[2].Name=Application Version
Sig[2].Value=1.0.0.0
Sig[3].Name=Application Timestamp
Sig[3].Value=5a68410c
Sig[4].Name=Fault Module Name
Sig[4].Value=ntdll.dll
Sig[5].Name=Fault Module Version
Sig[5].Value=10.0.17763.475
Sig[6].Name=Fault Module Timestamp
Sig[6].Value=3230aa04
Sig[7].Name=Exception Code
Sig[7].Value=c0000005
Sig[8].Name=Exception Offset
Sig[8].Value=000000000004df23
DynamicSig[1].Name=OS Version
DynamicSig[1].Value=10.0.17763.2.0.0.256.48
DynamicSig[2].Name=Locale ID
DynamicSig[2].Value=5129
DynamicSig[22].Name=Additional Information 1
DynamicSig[22].Value=95b1
DynamicSig[23].Name=Additional Information 2
DynamicSig[23].Value=95b15a88b673e33a5f48839974790b1c
DynamicSig[24].Name=Additional Information 3
DynamicSig[24].Value=283d
DynamicSig[25].Name=Additional Information 4
DynamicSig[25].Value=283dea7b6b6112710c1e3f76ed84d993

Dll Export Viewer / AppCrashView

Edit 3: screenshot of minidump from a crash last night. In the event log, the WER crash looks the same so this appears to be the same issue. I will see if I can load symbols etc.

minidump summary

Edit 4: Attempting to debug managed. Threads view shows a thread as the exception point but no call stack info.

no managed code

Edit 5: Debugging native from the minidump. Looks like we have a winner. @Raymond was correct, it was RtlpWaitOnCriticalSection invoked from BluetoothLEAdvertismentWatcher::AdvertismentReceivedCallbackWorker

debugging native

Native call stack as text:

Not Flagged > 8748 0 Worker Thread Win64 Thread Windows.Devices.Bluetooth.dll!(void) ntdll.dll!RtlpWaitOnCriticalSection() ntdll.dll!RtlpEnterCriticalSectionContended() ntdll.dll!RtlEnterCriticalSection() Windows.Devices.Bluetooth.dll!(void)() Windows.Devices.Bluetooth.dll!wil::ResultFromException<(void)

() Windows.Devices.Bluetooth.dll!Windows::Devices::Bluetooth::Advertisement::BluetoothLEAdvertisementWatcher::AdvertisementReceivedCallbackWorker(void) Windows.Devices.Bluetooth.dll!Windows::Devices::Bluetooth::Advertisement::BluetoothLEAdvertisementWatcher::AdvertisementReceivedThreadpoolWorkCallbackStatic(struct _TP_CALLBACK_INSTANCE *,void *,struct _TP_WORK *) ntdll.dll!TppWorkpExecuteCallback() ntdll.dll!TppWorkerThread() kernel32.dll!BaseThreadInitThunk() ntdll.dll!RtlUserThreadStart()

Edit 6: okay so now, what do I do? How can I resolve this problem? My understanding of the stack is it looks like an exception was thrown inside the callback? Is that correct? So I could put a managed try/catch in the BLE advertisment callback handler and that should (catch - for further debugging) fix it?

Edit 7: code... Here is the code we use to instantiate the wrapper and subscribe to events. The BluetoothLEAdvertisementWatcherWrapper is a delgating class (e.g. it just wraps the underlying BluetoothLEAdvertisementWatcher so it can implement an interface; it simply passes all events through and exposes properties. We do this so that we can have a different version that creates virtual events for testing)

bluetoothAdvertisementWatcher = new BluetoothLEAdvertisementWatcherWrapper();
bluetoothAdvertisementWatcher.SignalStrengthFilter.SamplingInterval = TimeSpan.Zero;
bluetoothAdvertisementWatcher.ScanningMode = BluetoothLEScanningMode.Active;
bluetoothAdvertisementWatcher.Received += Watcher_Received;
bluetoothAdvertisementWatcher.Stopped += Watcher_Stopped;
bluetoothAdvertisementWatcher.Start();

Here is the code for the wrapper; just to show it's not doing anything complex:

public class BluetoothLEAdvertisementWatcherWrapper : IBluetoothAdvertismentWatcher, IDisposable
    {
        private BluetoothLEAdvertisementWatcher bluetoothWatcher;
        public BluetoothLEAdvertisementWatcherWrapper()
        {
            bluetoothWatcher = new BluetoothLEAdvertisementWatcher();
        }

        public BluetoothSignalStrengthFilter SignalStrengthFilter => bluetoothWatcher.SignalStrengthFilter;

        public BluetoothLEScanningMode ScanningMode
        {
            get
            {
                return bluetoothWatcher.ScanningMode;
            }
            set
            {
                bluetoothWatcher.ScanningMode = value;
            }
        }

        public event TypedEventHandler<BluetoothLEAdvertisementWatcher, BluetoothLEAdvertisementReceivedEventArgs> Received
        {
            add
            {
                bluetoothWatcher.Received += value;
            }
            remove
            {
                bluetoothWatcher.Received -= value;
            }
        }

        public event TypedEventHandler<BluetoothLEAdvertisementWatcher, BluetoothLEAdvertisementWatcherStoppedEventArgs> Stopped
        {
            add
            {
                bluetoothWatcher.Stopped += value;
            }
            remove
            {
                bluetoothWatcher.Stopped -= value;
            }
        }

        public BluetoothLEAdvertisementWatcherStatus Status => bluetoothWatcher.Status;

        public Action<IPacketFrame, short> YieldAdvertisingPacket { get => throw new NotImplementedException(); set => throw new NotImplementedException(); }

        public void Start()
        {
            bluetoothWatcher.Start();
        }

        public void Stop()
        {
            bluetoothWatcher.Stop();
        }

        public void Dispose()
        {
            if (bluetoothWatcher != null)
            {
                if (bluetoothWatcher.Status == BluetoothLEAdvertisementWatcherStatus.Started)
                {
                    bluetoothWatcher.Stop();
                }
                bluetoothWatcher = null;
            }
        }
    }

And here is the code for the Watcher_Received event handler:

private void Watcher_Received(BluetoothLEAdvertisementWatcher sender, BluetoothLEAdvertisementReceivedEventArgs args)
        {
            try
            {
                //we won't queue packets until registered
                if (!ApplicationContext.Current.Details.ReceiverId.HasValue)
                    return;

                IPacketFrame frame;
                PacketFrameParseResult result = ParseFrame(args, out frame);

                if (result == PacketFrameParseResult.Success)
                {
                    ApplicationContext.Current.Details.BluetoothPacketCount++;
                }

                short rssi = args.RawSignalStrengthInDBm;
                string message = FormatPacketForDisplay(args, args.AdvertisementType, rssi, frame, result);

                if (BluetoothPacketReceived != null)
                {
                    BluetoothPacketReceived.Invoke(this, new BluetoothPacketReceivedEventArgs(message, result, frame, rssi));
                }
            }
            catch (Exception ex)
            {
                if (ex.InnerException is Exceptions.PacketFrameParseException && (ex.InnerException as Exceptions.PacketFrameParseException).Result == PacketFrameParseResult.InvalidData)
                {
                    // noop
                }
                else
                {
                    Logger.Log(LogLevel.Warning, "BLE listener caught bluetooth packet error: {0}", ex);

                    if (BluetoothPacketError != null)
                    {
                        BluetoothPacketError.Invoke(this, new BluetoothPacketErrorEventArgs(ex));
                    }
                }
            }
        }

You can see here that the entire managed callback is wrapped in a try catch and doesn't rethrow, so I'm not sure if there's anything further I can do to prevent the native exception from bringing the application down.

Current thinking, based on this: RtlpEnterCriticalSectionContended is it a parallel event handler, the native side is raising the handler, and it raises for a new event in the same thread while the previous handler is still executing from a previous event? Then this is a race condition on the critical section that causes the crash?

Edit 8: To test this theory, I replaced the contents of received with a read + push to a concurrent queue, allowing the managed code to exit the event handler as quickly as possible. Then added a seperate thread reading from the concurrent queue to perform my application side processing. Initially, I thought this had resolved the issue as the application actively (listening) ran for approximately 15 hours, however it crashed again this morning with the same symptoms.

Edit 8: Following suggestions in the comments, we tried to ensure that we didn't dispose/GC the watcher after a stop prior to the receive completing. We did this by using a TaskCompletionSource to function as a promise, subscribing to the Stopped event so we could await on the completion source task which would only have a result set when the Stopped event had fired.

We also used a lock (Monitor.Enter) in both StopAsync and Received to ensure that both could not be running in parallel. This appeared to reduce the speed at which the system could process events which would make sense if the BLE packets were arriving in parallel. Updated code as follows:

if ((DateTime.Now - this.LastStartedTimestamp).TotalSeconds > 60)
                    {
    if (this.LastStopReason != BluetoothWatcherStopReason.DeviceCharacteristicWorker)
    {
        Logger.Log(LogLevel.Debug, "Stopping bluetooth watcher...");
        // restart watcher every 10 mins
        await this.StopAsync(BluetoothWatcherStopReason.AutomaticRestart);

        //start again if automatic restart
        Logger.Log(LogLevel.Debug, "Starting bluetooth watcher...");
        this.Start(this.testMode);
        Logger.Log(LogLevel.Debug, "Started bluetooth watcher");
        this.LastStartedTimestamp = DateTime.Now;
    }
}

        private void Watcher_Stopped(BluetoothLEAdvertisementWatcher sender, BluetoothLEAdvertisementWatcherStoppedEventArgs args)
{
    string error = args.Error.ToString();

    Logger.Log(LogLevel.Warning, string.Format("BLE listening stopped because {0}...", error));
    LastError = args.Error;

    if (BluetoothWatcherStopped != null)
    {
        BluetoothWatcherStopped.Invoke(sender, args);
    }
}

public class ReceivedBluetoothAdvertismentPacketItem
{
    public DateTime Timestamp { get; set; }
    public BluetoothLEAdvertisementType Type { get; set; }
    public byte[] Buffer { get; set; }
    public short Rssi { get; set; }
}
ConcurrentQueue<ReceivedBluetoothAdvertismentPacketItem> BluetoothPacketsReceivedQueue = new ConcurrentQueue<ReceivedBluetoothAdvertismentPacketItem>();
private void Watcher_Received(BluetoothLEAdvertisementWatcher sender, BluetoothLEAdvertisementReceivedEventArgs args)
{
    bool lockWasTaken = false;
    try
    {
        //this prevents stop until we're exiting Received
        Monitor.Enter(BluetoothWatcherEventSynchronisation, ref lockWasTaken);
        if (!lockWasTaken)
        {
            return;
        }
        //we won't queue packets until registered
        if (!ApplicationContext.Current.ReceiverDetails.ReceiverId.HasValue)
            return;

        BluetoothLEAdvertisementType type = args.AdvertisementType;
        byte[] buffer = GetManufacturerData(args.Advertisement);
        short rssi = args.RawSignalStrengthInDBm;
        BluetoothPacketsReceivedQueue.Enqueue(new ReceivedBluetoothAdvertismentPacketItem
        {
            Timestamp = DateTime.UtcNow,
            Type = type,
            Rssi = rssi,
            Buffer = buffer
        });
        ApplicationContext.Current.ReceiverDetails.UnprocessedQueueLength = BluetoothPacketsReceivedQueue.Count;
    }
    catch (Exception ex)
    {
        Logger.Log(LogLevel.Warning, "BLE listener caught bluetooth packet error: {0}", ex);
        if (BluetoothPacketError != null)
        {
            BluetoothPacketError.Invoke(this, new BluetoothPacketErrorEventArgs(ex));
        }
    }
    finally
    {
        if (lockWasTaken)
        {
            Monitor.Exit(BluetoothWatcherEventSynchronisation);
        }
    }
}


public BluetoothWatcherStopReason LastStopReason { get; private set; } = BluetoothWatcherStopReason.Unknown;
private object BluetoothWatcherEventSynchronisation = new object();
public Task<BluetoothWatcherStopReason> StopAsync(BluetoothWatcherStopReason reason)
{
    var promise = new TaskCompletionSource<BluetoothWatcherStopReason>();

    if (bluetoothAdvertisementWatcher != null)
    {
        LastStopReason = reason;
        UpdateBluetoothStatusInReceiverModel(BluetoothLEAdvertisementWatcherStatus.Stopped); //actually stopping but we lie
        bool lockWasTaken = false;
        try
        {
            Monitor.Enter(BluetoothWatcherEventSynchronisation, ref lockWasTaken);
            {
                bluetoothAdvertisementWatcher.Received -= Watcher_Received;
                bluetoothAdvertisementWatcher.Stopped += (sender, args) =>
                {
                    // clean up
                    if (bluetoothAdvertisementWatcher != null)
                    {
                        bluetoothAdvertisementWatcher.Stopped -= Watcher_Stopped;
                        bluetoothAdvertisementWatcher = null;
                    }
                    //notify continuation
                    promise.SetResult(reason);
                };
                bluetoothAdvertisementWatcher.Stop();
            }
        }
        finally
        {
            if (lockWasTaken)
            {
                Monitor.Exit(BluetoothWatcherEventSynchronisation);
            }
        }
    }
    base.Stop();
    return promise.Task;
}

Following these changes, the same crash is still occuring in the Windows.Devices.Bluetooth native assembly (as per above)

Edit 9: I've removed the automatic periodic start/stop and now the app has been stable for > 36 hours without a crash. So something inside this flow is causing the crashes. We originally added that to work around an issue with the advertisment watcher just stopping after a while, so we'd like to keep it if we can fix it. The if statement if ((DateTime.Now - this.LastStartedTimestamp).TotalSeconds > 60) (and block) is currently commented.

I have opened a bug for windows universal here: https://wpdev.uservoice.com/forums/110705-universal-windows-platform/suggestions/37623343-bluetoothleadvertismentwatcher-advertismentreceiv

c#
uwp
asked on Stack Overflow May 10, 2019 by agrath • edited May 19, 2019 by agrath

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0