We have the following setup currently:
Usually there are no problems, then suddenly the application from application server will start reporting errors. Checking the event logs in the database server, it can be found that when the application server tries to login the SQL server using Windows Authentication (a domain account), it will report the following error:
Event ID 17806 (MSSQLSERVER):
SSPI handshake failed with error code 0x80090311, state 14 while establishing a connection with integrated security; the connection has been closed. Reason: AcceptSecurityContext failed. The Windows error code indicates the cause of failure. No authority could be contacted for authentication. [CLIENT:
IP_of_App_Server
]
and
Event ID 18452 (MSSQLSERVER):
Login failed. The login is from an untrusted domain and cannot be used with Windows authentication. [CLIENT:
IP_of_App_Server
]
Then I would test Remote Desktop to the database server, using a domain account, it would fail with an error complaining that the server could not contact the AD to verify the account and will not log me in since Network Level Authentication is enforced. Also tried logging in the standby database node using domain account, and had no problems.
I can still login using local admin and checking the System Event Logs will find the following error:
Event ID 5719 (NETLOGON):
This computer was not able to set up a secure session with a domain controller in domain
domain_name
due to the following: There are currently no logon servers available to service the logon request. This may lead to authentication problems. Make sure that this computer is connected to the network. If the problem persists, please contact your domain administrator.ADDITIONAL INFO If this computer is a domain controller for the specified domain, it sets up the secure session to the primary domain controller emulator in the specified domain. Otherwise, this computer sets up the secure session to any domain controller in the specified domain.
Usually this problem will resolve by itself after a while. However, there was a long incident that lasts over an hour this morning and in the end I resolved it by failover-ing my SQL Service to another cluster node then immediately failover-ing it back.
I have tried looking online and found most issues are caused by network problems. I can confirm whenever the problem is happening, my database server can still ping and resolve the active directory server. I think the issue is OS related. I also found that it might have something to do with KB3002657 server fault link, which I found was installed. However, reading from the posts and Microsoft KB it seems the problem with this patch should affect the servers all the time and seems to be more Microsoft Server 2003 related.
I currently don't have any ideas on this problem and grateful if someone could shed some light on this, thank you.
We have exactly the same configuration only with one difference we use 2 SQL 2016 version on Windows Failover Clustering but the problem the same. My problem is gone when I added SPN records for SQL listeners for SQL Server service domain account, for example 2 SPNs for listener SQL-listener1: MSSQLSvc/SQL-listener1.domain.com, MSSQLSvc/SQL-listener1.domain.com:1433
User contributions licensed under CC BY-SA 3.0