How to fix: 'After MS SQL Server restart alipne .net core 2.2.5 container which use it has one CLOSE_WAIT tcp connection - CPU of dotnet rise'

7

I have problem with App that is hosted on official asp-dotnet-core-alipne container and SQL Server on other server. After restart server on which we have SQL, containers gets high CPU and some internal thread hangs. Helps only restart of container. We investigated that there are some tcp connection in state CLOSE_WAIT when this situations occurred. Some information about app and server:

Details about app:

  • .net core 2.2 (C#)
  • based on official alpine container (mcr.net.core.asp:2.2.5-alpine3.9)
  • used Dapper as well as ADO.NET
  • used Async method with async/await
  • used Quartz.NET for job scheduling

Details about Docker

  • hosted on Centos 7
  • used Docker 19.03

Details about SQL Server:

  • MS SQL 2014 Standard x64
  • Windows Server 2012R2
  • hosted on Virtual Machine

Detailed problem description:

Container with app works 24/7 on linux server (Centos 7) on which docker is installed. On same server there is virtual machine with Windows Server and MS SQL Server 2014 installed on it. App work fine if there some network issue and other stuff but after restart this server I get error:

[19-08-09 04:15:44.15 ERR -- SSI.Pojazd 0216NPIK] Job SSI.Pojazd.retry_Sms_RetryJob`1 threw an unhandled Exception: 
System.Data.SqlClient.SqlException (0x80131904): SHUTDOWN is in progress.
Login failed for user 'XXXX'.
Cannot continue the execution because the session is in the kill state.
A severe error occurred on the current command.  The results, if any, should be discarded.
   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
   at System.Data.SqlClient.TdsParser.TdsExecuteTransactionManagerRequest(Byte[] buffer, TransactionManagerRequestType request, String transactionName, TransactionManagerIsolationLevel isoLevel, Int32 timeout, SqlInternalTransaction transaction, TdsParserStateObject stateObj, Boolean isDelegateControlRequest)
   at System.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransactionYukon(TransactionRequest transactionRequest, String transactionName, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
   at System.Data.SqlClient.SqlInternalConnection.BeginSqlTransaction(IsolationLevel iso, String transactionName, Boolean shouldReconnect)
   at System.Data.SqlClient.SqlConnection.BeginTransaction(IsolationLevel iso, String transactionName)
   at Transport.Core.Abstractions.Database.TransportStoredProcedureWithResult`2.Execute(TInput input)
   at Transport.Jobs.RetriesJobs.RetryJob`1.Execute(IJobExecutionContext context)
   at Quartz.Core.JobRunShell.Run(CancellationToken cancellationToken)
ClientConnectionId:1d7f1d2d-f80b-42cf-906f-f0ee57b14f59
Error Number:6005,State:1,Class:14

After this error everything starts to behave weirdly:

  • CPU of container rise from 0.9 CPU to 100 CPU and more
  • dotnet process consume CPU (on hosting linux processes ) rise from 1-2 to 12-14 cores (defends of how many apps are working)
  • HealthCheck sometimes hangs -> no answer from server when used simple CURL query
  • there are CLOSE_WAIT tcp connection to SQL
  • if app use Quartz.NET job there stop fired (concurrency disabled) and hang on last call

TCP connection log:

tcp        0      0 173.25.0.2:44920        10.6.67.122:5672        ESTABLISHED
tcp        0      0 173.25.0.2:47488        10.12.128.12:1433       ESTABLISHED
tcp        0      0 173.25.0.2:47246        10.12.128.12:1433       ESTABLISHED
tcp        0      0 173.25.0.2:46785        10.6.67.122:5672        ESTABLISHED
tcp        0      0 173.25.0.2:45556        10.12.128.12:1433       CLOSE_WAIT
tcp        0      0 173.25.0.2:45520        10.12.128.12:1433       CLOSE_WAIT 

What I tried to do:

  • Use CancellationToken as some way of cancel DbOperation in SqlClient.dll

      [Dapper code]
      public async Task Execute(TInput input, IScope scope,CancellationToken token)
      {
          await scope.GetConnection().ExecuteAsync(new CommandDefinition(GetStoredProcedureName(), GetParameters(input), scope.GetTransaction(),
          commandType: CommandType.StoredProcedure,cancellationToken:token)).ConfigureAwait(false);
      }
    
      [ADO.NET CODE]
      public async Task CheckAsync(string connectionString, int timeout, CancellationToken cancellationToken)
      {
          try
          {
              SqlConnectionStringBuilder connectionStringBuilder =
                  new SqlConnectionStringBuilder(connectionString) {ConnectTimeout = 2};
    
              using (var conn = new SqlConnection(connectionStringBuilder.ToString()))
              {
                  try
                  {
                      if (cancellationToken.IsCancellationRequested)
                          cancellationToken.ThrowIfCancellationRequested();
                      await conn.OpenAsync(cancellationToken).ConfigureAwait(false);
                      using (var cmd = conn.CreateCommand())
                      {
                          cmd.CommandText = "Select 1";
                          cmd.CommandTimeout = timeout;
                          cancellationToken.ThrowIfCancellationRequested();
                          await cmd.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
                      }
                  }
                  catch (TaskCanceledException tce)
                  {
                      TransportLogger.Log.Debug("Task cancelled " + tce.Message + tce.StackTrace + " " + (tce.InnerException == null ? string.Empty : tce.InnerException.Message));
                      throw;
                  }
                  finally
                  {
                      if (conn.State == ConnectionState.Open) conn.Close();
                  }
    
    
              }
          }
          catch (Exception ex)
          {
              TransportLogger.Log.Error(ex, "Cannot check db assebility");
              throw;
          }
      }
    
  • Clear pool using ClearPool and ClearAllPool method

      public class CheckDatabaseIsAccessible : ICheckDatabaseIsAccessible
      {
          public async Task CheckAsync(string connectionString, int timeout, CancellationToken cancellationToken)
          {
              try
              {
                  SqlConnectionStringBuilder connectionStringBuilder =
                      new SqlConnectionStringBuilder(connectionString) {ConnectTimeout = 2};
    
                  using (var conn = new SqlConnection(connectionStringBuilder.ToString()))
                  {
                      try
                      {
                          if (cancellationToken.IsCancellationRequested)
                              cancellationToken.ThrowIfCancellationRequested();
                          await conn.OpenAsync(cancellationToken).ConfigureAwait(false);
                          using (var cmd = conn.CreateCommand())
                          {
                              cmd.CommandText = "Select 1";
                              cmd.CommandTimeout = timeout;
                              cancellationToken.ThrowIfCancellationRequested();
                              await cmd.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
                          }
                      }
                      catch (TaskCanceledException tce)
                      {
                          TransportLogger.Log.Debug("Task cancelled " + tce.Message + tce.StackTrace + " " + (tce.InnerException == null ? string.Empty : tce.InnerException.Message));
                          throw;
                      }
                      finally
                      {
                          if (conn.State == ConnectionState.Open) conn.Close();
                      }
    
    
                  }
              }
              catch (Exception ex)
              {
                  TransportLogger.Log.Error(ex, "Cannot check db assebility");
                  throw;
              }
          }
    
          public void Check(string connectionString, int timeout)
          {
              try
              {
                  SqlConnectionStringBuilder connectionStringBuilder =
                      new SqlConnectionStringBuilder(connectionString) {ConnectTimeout = 2};
    
                  using (var conn = new SqlConnection(connectionStringBuilder.ToString()))
                  {
                      try
                      {
                          TransportLogger.Log.Debug("Open connection");
                          conn.Open();
                          TransportLogger.Log.Debug("connection open");
                          using (var cmd = conn.CreateCommand())
                          {
                              cmd.CommandText = "Select 1";
                              cmd.CommandTimeout = timeout;
                              TransportLogger.Log.Debug("Command executing...");
                              cmd.ExecuteNonQuery();
                              TransportLogger.Log.Debug("Command executed");
                          }
                      }
                      catch (TaskCanceledException tce)
                      {
                          TransportLogger.Log.Debug("Task cancelled " + tce.Message + tce.StackTrace + " " + (tce.InnerException == null ? string.Empty : tce.InnerException.Message));
                          throw;
                      }
                      finally
                      {
                          if (conn.State == ConnectionState.Open) conn.Close();
                      }
    
    
                  }
              }
              catch (Exception ex)
              {
                  TransportLogger.Log.Error(ex, "Cannot check db assebility");
                  throw;
              }
          }
      }
    

Pool is cleared but nothing happened, problem still exists:

  [19-08-09 04:16:37.16 DBG -- SSI.Pojazd 0216NPIK] Task cancelled A task was canceled.   at Transport.Core.Abstractions.Database.CheckDatabaseIsAccessible.CheckAsync(String connectionString, Int32 timeout, CancellationToken cancellationToken) 
  [19-08-09 04:16:37.16 INF -- SSI.Pojazd 0216NPIK] Try to clear pool 
  [19-08-09 04:16:37.16 INF -- SSI.Pojazd 0216NPIK] Pool cleared 

I try to reproduce this on test env but I failed. We have approximately 400 servers and this situation is quite common. Maybe someone had this problem and he know solution?

Also I read:

c#
sql
docker
.net-core
alpine
asked on Stack Overflow Aug 13, 2019 by Szymon Szczepanski • edited Aug 26, 2020 by Jason Aller

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0