I'm running BizTalk production environment on two separate virtual machines (Hyper-V) , lets call them APP and DB. They are in the same network, tied with domain controllers. I use 2 our company DNSes on my network adapters in my VMs and virtual switch in my virtualization host.
Problem is sometimes DB got the error that cannot authorize APP which is a blocker - everything goes down. I can't find any pattern to it, this happen in 3 - 6 months period and its totally random to me.
First I blamed the domain, but there are no logs about it in domain controller hosts. I also blamed network but also admin of network has none info about any failure nor do I.
Advice me what to monitor and how to detect what is wrong.
From APP perspective there is an error:
An attempt to connect to "BizTalkMgmtDb" SQL Server database on server "DB" failed.
Error: "Login failed. The login is from an untrusted domain and cannot be used with Windows authentication."
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="BizTalk Server" />
<EventID Qualifiers="49344">6913</EventID>
<Level>2</Level>
<Task>1</Task>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime="2020-04-28T13:51:48.000000000Z" />
<EventRecordID>2831238</EventRecordID>
<Channel>Application</Channel>
<Computer>APP.mydomain.com</Computer>
<Security />
</System>
<EventData>
<Data>DB</Data>
<Data>BizTalkMgmtDb</Data>
<Data>Login failed. The login is from an untrusted domain and cannot be used with Windows authentication.</Data>
</EventData>
</Event>
From DB perspective I've got following error :
SSPI handshake failed with error code 0x80090311, state 14 while establishing a connection with integrated security; the connection has been closed. Reason: AcceptSecurityContext failed. The Windows error code indicates the cause of failure. [CLIENT: APP IP].
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="MSSQLSERVER" />
<EventID Qualifiers="49152">18452</EventID>
<Level>0</Level>
<Task>4</Task>
<Keywords>0x90000000000000</Keywords>
<TimeCreated SystemTime="2020-04-28T13:51:48.000000000Z" />
<EventRecordID>712620</EventRecordID>
<Channel>Application</Channel>
<Computer>DB.mydomain.com</Computer>
<Security />
</System>
<EventData>
<Data>[CLIENT: APP IP]</Data> <Binary>144800000E0000000900000042005400500052004F004400300032000000070000006D00610073007400650072000000</Binary>
</EventData>
</Event>
and the second:
Login failed. The login is from an untrusted domain and cannot be used with Windows authentication. [CLIENT: APP IP]
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="MSSQLSERVER" />
<EventID Qualifiers="49152">17806</EventID>
<Level>2</Level>
<Task>4</Task>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime="2020-04-28T13:51:48.000000000Z" />
<EventRecordID>712614</EventRecordID>
<Channel>Application</Channel>
<Computer>DB.mydomain.com</Computer>
<Security />
</System>
<EventData>
<Data>80090311</Data>
<Data>14</Data>
<Data>AcceptSecurityContext failed. The Windows error code indicates the cause of failure.</Data>
<Data>[CLIENT: APP IP]</Data>
<Binary>8E450000140000000900000042005400500052004F00440030003200000000000000</Binary>
</EventData>
</Event>
After today whole-day investigation I know that it looks like something/somebody disabled APP and DB NICs - (there is no log for disabling it tho). It is definitely network related problem but I don't know how to monitor or troubleshoot it. Maybe there is something wrong with Hyper-V itself ?
Do you have a cluster failover for your Hosts instances? I encountered a similar problem when clustering our hosts instances. I opened an incident ticket with Microsoft , after a lot of investigation we wasn't able to find the root cause, we simplu uninstalled the cluster service. Microsoft support used a tool (a .Net console app) they scheduled every 1à sec , to open/close a connection from each Biztalk server to Db server and log the exception messages. It's a network related problem
I have discovered this thread:
There is an issue with broadcomm drivers and VMQ settings which applies in my environment.
I'll try to turn VMQ down in my next service window.
User contributions licensed under CC BY-SA 3.0