SQL Server 2016 R Services: sp_execute_external_script returns 0x80004005 error

2

I run some R code after querying 100M records and get the following error after the process runs for over 6 hours:

Msg 39004, Level 16, State 19, Line 300
A 'R' script error occurred during execution of 'sp_execute_external_script'     
with HRESULT 0x80004005.

HRESULT 0x80004005 appears to be associated in Windows with Connectivity, Permissions or an "Unspecified" error.

I know from logging in my R code that the process never reaches the R script at all. I also know that the entire procedure completes after 4 minutes on a smaller number of records, for example, 1M. This leads me to believe that this is a scaling problem or some issue with the data, rather than a bug in my R code. I have not included the R code or the full query for proprietary reasons.

However, I would expect a disk or memory error to display a 0x80004004 Out of memory error if that were the case.

One clue I noticed in the SQL ERRORLOG is the following:

SQL Server received abort message and abort execution for major error : 18 
and minor error : 42

However the time of this log line does not coincide with the interruption of the process, although it does occur after it started. Unfortunately, there is precious little on the web about "major error 18".

A SQL Trace when running from SSMS shows the client logging in and logging out every 6 minutes or so, but I can only assume this is normal keepalive behaviour.

The sanitized sp_execute_external_script call:

      EXEC sp_execute_external_script
                    @language = N'R'
                  , @script = N'#We never get here
                                #returns name of output data file'
                  , @input_data_1 = N'SELECT TOP 100000000 FROM DATA'   
                  , @input_data_1_name = N'x'
                  , @output_data_1_name = N'output_file_df'
                  WITH RESULT SETS ((output_file varchar(100) not null))

Server Specs: 8 cores 256 GB RAM SQL Server 2016 CTP 3

Any ideas, suggestions or debugging hints would be greatly appreciated!

UPDATE: Set TRACE_LEVEL=3 in rlauncher.config to turn on a higher level of logging and re-ran the process. The log reveals a cleanup process that ran, removing session files, at the time the entire process failed after 6.5 hours.

[2016-05-30 01:35:34.419][00002070][00001EC4][Info] SQLSatellite_LaunchSatellite(1, A187BC64-C349-410B-861E-BFDC714C8017, 1, 49232, nullptr) completed: 00000000

[2016-05-30 01:35:34.420][00002070][00001EC4][Info] < SQLSatellite_LaunchSatellite, dllmain.cpp, 223

[2016-05-30 08:04:02.443][00002070][00001EC4][Info] > SQLSatellite_LauncherCleanUp, dllmain.cpp, 309

[2016-05-30 08:04:07.443][00002070][00001EC4][Warning] Session A187BC64-C349-410B-861E-BFDC714C8017 cleanup wait failed with 258 and error 0

[2016-05-30 08:04:07.444][00002070][00001EC4][Info] Session(A187BC64-C349-410B-861E-BFDC714C8017) logged 2 output files

[2016-05-30 08:04:07.444][00002070][00001EC4][Warning] TryDeleteSingleFile(C:\PROGRA~1\MICROS~1\MSSQL1~1.MSS\MSSQL\EXTENS~1\MSSQLSERVER06\A187BC64-C349-410B-861E-BFDC714C8017\Rscript1878455a2528) failed with 32

[2016-05-30 08:04:07.445][00002070][00001EC4][Warning] TryDeleteSingleDirectory(C:\PROGRA~1\MICROS~1\MSSQL1~1.MSS\MSSQL\EXTENS~1\MSSQLSERVER06\A187BC64-C349-410B-861E-BFDC714C8017) failed with 32

[2016-05-30 08:04:08.446][00002070][00001EC4][Info] Session A187BC64-C349-410B-861E-BFDC714C8017 removed from MSSQLSERVER06 user

[2016-05-30 08:04:08.447][00002070][00001EC4][Info] SQLSatellite_LauncherCleanUp(A187BC64-C349-410B-861E-BFDC714C8017) completed: 00000000

It appears the only way to allow my long-running process to continue is to: a) Extend the Job Cleanup wait time to allow the job to finish b) Disable the Job Cleanup process

I have thus far been unable to find the value that sets the Job Cleanup wait time in the MSSQLLaunchpad service. While a JOB_CLEANUP_ON_EXIT flag exists in rlauncher.config, setting it to 0 has no effect. The service seems to reset it to 1 when it is restarted.

Again, any suggestions or assistance would be much appreciated!

sql-server
r
sql-server-2016
microsoft-r
bigdata
asked on Stack Overflow May 28, 2016 by ryang • edited Sep 22, 2017 by Community

5 Answers

2

By default, SQL Server reads all data into R memory as a Data Frame before starting execution of R script. Based on the fact that the script works with 1M rows and fails to start with 100M rows, this could potentially be an Out of Memory error. To resolve memory issues, (other than increasing memory on machine/reducing data size) you can try one of these solutions

  1. Increase memory allocation for R process execution using sys.resource_governor_external_resource_pools max_memory_percent setting. By default, SQL Server limits R process execution to 20% of memory.
  2. Streaming execution for R script instead of loading all data into memory. Note that this parameter can only be used in cases where the output of the R script doesn’t depend on reading or looking at the entire set of rows.

The Warnings in RLauncher.log about data cleanup happened after the R script execution can be safely ignored and probably not the root cause for the failures you are seeing.

answered on Stack Overflow Jun 1, 2016 by Arun Gurunathan • edited Jun 2, 2016 by Arun Gurunathan
0

Unable to resolve this issue in SQL, I simply avoided the SQL Server Launchpad service which was interrupting the processing and pulled the data from SQL using the R RODBC library. The pull took just over 3 hours (instead of 6+ using sp_execute_external_procedure).

This might implicate the SQL Launchpad service, and suggests that memory was not the issue.

answered on Stack Overflow Jun 4, 2016 by ryang
0

Please try your scenario in SQL Server 2016 RTM. There have been many functional and performance fixes made since CTP3.

For more information on how to get the SQL Server 2016 RTM checkout SQL Server 2016 is generally available today blogpost.

answered on Stack Overflow Jun 6, 2016 by Arun Gurunathan
0

I had almost the same issue with SQL Server 2016 RTM-CU1. My Query failed with error 0x80004004 instead of 0x80004005. And it failed beginning with 10,000,000 records, but that could be related to only having 16 GB memory and/or different data.

I got around it by using a field list instead of "*". Even if the field list contains all the fields from the data source (a rather complicated view in my case), a query featuring a field list is always successful, while "SELECT TOP x * FROM ..." always fails for some large x.

answered on Stack Overflow Sep 15, 2016 by SLSvenR
0

I've had the a similar error (0x80004004), and the problem was that one of the rows in one of the columns contained a "very" special character (I'm saying "very" because other special characters did not cause this error).

So that when I replaced 'Folkelånet Telefinans' with 'Folkelanet Telefinans', the problem went away.

In your case, maybe at least one of the values in the last 99M rows contains something like that character, and you just have to replace it. I hope that Microsoft will resolve this issue at some point.

answered on Stack Overflow Oct 12, 2017 by Dennis Bondar

User contributions licensed under CC BY-SA 3.0