We have a customer with a Windows Server 2016 domain controller. It's a small business so their server infrastructure consists of a Hyper-V host and this DC. The DC hosts file shares and Azure AD Connect for syncing identity with Office 365.
We monitor for event ID 4625 and have an alerting threshold to help us identify potential brute force attacks against the network.
In October of last year we began receiving alerts that the failed logon alert threshold had been exceeded. Upon investigation we have the following description of the problem:
vssadmin list writers
The list of troubleshooting over the last several months is long. This is not a comprehensive list:
rundll32 keymgr.dll,KRShowKeyMgr
- no credentials cached)A useful thing learned during all of this
vssadmin list writers
continually during the installation process, the errors begin immediately after the SQL components are installed, before the installer has even finished running.The problem is clearly related to AADC because I can stop the AAD sync service or uninstall AADC and all failed logon events go away. But uninstalling AADC & deleting AADC folders & cleaning out AADC user accounts & clearing AADC registry entries to try and get a truly fresh install has no effect, the errors return immediately when I reinstall AADC.
At this point I'm at my wits end and I don't know what else to do or where else to even look. I'm hoping someone out there in the aether knows more than I do (likely) or has experienced this before and found a fix.
One final note - the server's DNS name is 9 characters long, meaning that it does not match its NETBIOS name. I don't think this is the cause, but if necessary I can rename the server. It's just a bit of a headache to do for an in-production DC & file server.
This problem originally began occurring in October of 2019. It took almost a year but I finally found a solution which hints at a possible explanation.
The solution was to configure the following registry key and value:
This resolved the failed logins whenever VSS ran against a SQL database.
This is part of a security function introduced in Windows Server 2003 called Loopback Check Functionality.
From what I've read about how Loopback Check Functionality works, what I believe is going on is that whenever VSS logs on to SQL to perform a backup, it logs on as SYSTEM. LSA expects the logon for SYSTEM to come from the server's DNS name, but the logon is actually coming from the server's NETBIOS name. Because the DNS name does not match the NETBIOS name in this case, LSA fails the Kerberos authentication and the login falls back to NTLM which accepts the NETBIOS name.
By configuring BackConnectionHostNames
we tell LSA to accept the connection from both the NETBIOS and DNS names and kerberos authentication succeeds.
I was able to trace the error by using Sysinternals ProcessMonitor to track down everything that VSS was doing when the errors occurred. I found VSS accessing folders located at C:\Users\ {AzureADConnect Account} \AppData\Local\Microsoft\Microsoft SQL Server Local DB\Instances\ADSync where I found error.log files. These logs contained the following error:
2020-08-13 13:00:47.43 Logon Error: 17806, Severity: 20, State: 14.
2020-08-13 13:00:47.43 Logon SSPI handshake failed with error code 0x8009030c, state 14 while establishing a connection with integrated security; the connection has been closed. Reason: AcceptSecurityContext failed. The Windows error code indicates the cause of failure. The logon attempt failed [CLIENT: <named pipe>]
This was the breakthrough I needed, since that error information led me to several locations, such as this SE question, which recommended disabling loopback checks entirely. Not wanting to disable a security feature, I continued searching until I found sources (1) and (2) that described how to configure Loopback Check Functionality without disabling it, by creating the registry entry for BackConnectionHostNames
as I outlined above.
User contributions licensed under CC BY-SA 3.0