I have problem with App that is hosted on official asp-dotnet-core-alipne container and SQL Server on other server. After restart server on which we have SQL, containers gets high CPU and some internal thread hangs. Helps only restart of container. We investigated that there are some tcp connection in state CLOSE_WAIT when this situations occurred. Some information about app and server:
Details about app:
Details about Docker
Details about SQL Server:
Detailed problem description:
Container with app works 24/7 on linux server (Centos 7) on which docker is installed. On same server there is virtual machine with Windows Server and MS SQL Server 2014 installed on it. App work fine if there some network issue and other stuff but after restart this server I get error:
[19-08-09 04:15:44.15 ERR -- SSI.Pojazd 0216NPIK] Job SSI.Pojazd.retry_Sms_RetryJob`1 threw an unhandled Exception:
System.Data.SqlClient.SqlException (0x80131904): SHUTDOWN is in progress.
Login failed for user 'XXXX'.
Cannot continue the execution because the session is in the kill state.
A severe error occurred on the current command. The results, if any, should be discarded.
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
at System.Data.SqlClient.TdsParser.TdsExecuteTransactionManagerRequest(Byte[] buffer, TransactionManagerRequestType request, String transactionName, TransactionManagerIsolationLevel isoLevel, Int32 timeout, SqlInternalTransaction transaction, TdsParserStateObject stateObj, Boolean isDelegateControlRequest)
at System.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransactionYukon(TransactionRequest transactionRequest, String transactionName, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
at System.Data.SqlClient.SqlInternalConnection.BeginSqlTransaction(IsolationLevel iso, String transactionName, Boolean shouldReconnect)
at System.Data.SqlClient.SqlConnection.BeginTransaction(IsolationLevel iso, String transactionName)
at Transport.Core.Abstractions.Database.TransportStoredProcedureWithResult`2.Execute(TInput input)
at Transport.Jobs.RetriesJobs.RetryJob`1.Execute(IJobExecutionContext context)
at Quartz.Core.JobRunShell.Run(CancellationToken cancellationToken)
ClientConnectionId:1d7f1d2d-f80b-42cf-906f-f0ee57b14f59
Error Number:6005,State:1,Class:14
After this error everything starts to behave weirdly:
TCP connection log:
tcp 0 0 173.25.0.2:44920 10.6.67.122:5672 ESTABLISHED
tcp 0 0 173.25.0.2:47488 10.12.128.12:1433 ESTABLISHED
tcp 0 0 173.25.0.2:47246 10.12.128.12:1433 ESTABLISHED
tcp 0 0 173.25.0.2:46785 10.6.67.122:5672 ESTABLISHED
tcp 0 0 173.25.0.2:45556 10.12.128.12:1433 CLOSE_WAIT
tcp 0 0 173.25.0.2:45520 10.12.128.12:1433 CLOSE_WAIT
What I tried to do:
Use CancellationToken as some way of cancel DbOperation in SqlClient.dll
[Dapper code]
public async Task Execute(TInput input, IScope scope,CancellationToken token)
{
await scope.GetConnection().ExecuteAsync(new CommandDefinition(GetStoredProcedureName(), GetParameters(input), scope.GetTransaction(),
commandType: CommandType.StoredProcedure,cancellationToken:token)).ConfigureAwait(false);
}
[ADO.NET CODE]
public async Task CheckAsync(string connectionString, int timeout, CancellationToken cancellationToken)
{
try
{
SqlConnectionStringBuilder connectionStringBuilder =
new SqlConnectionStringBuilder(connectionString) {ConnectTimeout = 2};
using (var conn = new SqlConnection(connectionStringBuilder.ToString()))
{
try
{
if (cancellationToken.IsCancellationRequested)
cancellationToken.ThrowIfCancellationRequested();
await conn.OpenAsync(cancellationToken).ConfigureAwait(false);
using (var cmd = conn.CreateCommand())
{
cmd.CommandText = "Select 1";
cmd.CommandTimeout = timeout;
cancellationToken.ThrowIfCancellationRequested();
await cmd.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
}
}
catch (TaskCanceledException tce)
{
TransportLogger.Log.Debug("Task cancelled " + tce.Message + tce.StackTrace + " " + (tce.InnerException == null ? string.Empty : tce.InnerException.Message));
throw;
}
finally
{
if (conn.State == ConnectionState.Open) conn.Close();
}
}
}
catch (Exception ex)
{
TransportLogger.Log.Error(ex, "Cannot check db assebility");
throw;
}
}
Clear pool using ClearPool and ClearAllPool method
public class CheckDatabaseIsAccessible : ICheckDatabaseIsAccessible
{
public async Task CheckAsync(string connectionString, int timeout, CancellationToken cancellationToken)
{
try
{
SqlConnectionStringBuilder connectionStringBuilder =
new SqlConnectionStringBuilder(connectionString) {ConnectTimeout = 2};
using (var conn = new SqlConnection(connectionStringBuilder.ToString()))
{
try
{
if (cancellationToken.IsCancellationRequested)
cancellationToken.ThrowIfCancellationRequested();
await conn.OpenAsync(cancellationToken).ConfigureAwait(false);
using (var cmd = conn.CreateCommand())
{
cmd.CommandText = "Select 1";
cmd.CommandTimeout = timeout;
cancellationToken.ThrowIfCancellationRequested();
await cmd.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
}
}
catch (TaskCanceledException tce)
{
TransportLogger.Log.Debug("Task cancelled " + tce.Message + tce.StackTrace + " " + (tce.InnerException == null ? string.Empty : tce.InnerException.Message));
throw;
}
finally
{
if (conn.State == ConnectionState.Open) conn.Close();
}
}
}
catch (Exception ex)
{
TransportLogger.Log.Error(ex, "Cannot check db assebility");
throw;
}
}
public void Check(string connectionString, int timeout)
{
try
{
SqlConnectionStringBuilder connectionStringBuilder =
new SqlConnectionStringBuilder(connectionString) {ConnectTimeout = 2};
using (var conn = new SqlConnection(connectionStringBuilder.ToString()))
{
try
{
TransportLogger.Log.Debug("Open connection");
conn.Open();
TransportLogger.Log.Debug("connection open");
using (var cmd = conn.CreateCommand())
{
cmd.CommandText = "Select 1";
cmd.CommandTimeout = timeout;
TransportLogger.Log.Debug("Command executing...");
cmd.ExecuteNonQuery();
TransportLogger.Log.Debug("Command executed");
}
}
catch (TaskCanceledException tce)
{
TransportLogger.Log.Debug("Task cancelled " + tce.Message + tce.StackTrace + " " + (tce.InnerException == null ? string.Empty : tce.InnerException.Message));
throw;
}
finally
{
if (conn.State == ConnectionState.Open) conn.Close();
}
}
}
catch (Exception ex)
{
TransportLogger.Log.Error(ex, "Cannot check db assebility");
throw;
}
}
}
Pool is cleared but nothing happened, problem still exists:
[19-08-09 04:16:37.16 DBG -- SSI.Pojazd 0216NPIK] Task cancelled A task was canceled. at Transport.Core.Abstractions.Database.CheckDatabaseIsAccessible.CheckAsync(String connectionString, Int32 timeout, CancellationToken cancellationToken)
[19-08-09 04:16:37.16 INF -- SSI.Pojazd 0216NPIK] Try to clear pool
[19-08-09 04:16:37.16 INF -- SSI.Pojazd 0216NPIK] Pool cleared
I try to reproduce this on test env but I failed. We have approximately 400 servers and this situation is quite common. Maybe someone had this problem and he know solution?
Also I read:
User contributions licensed under CC BY-SA 3.0