Error in mutate_impl(.data, dots) : attempt to use zero-length variable name R-Services on SQL Server

1

I am using R Services on SQL Server. Following is an example of my code where I am computing Max of a column using R :

EXECUTE sp_execute_external_script @language = N'R' 
    , @script = N'
        r = order(InputDataSet$Id)
        InputDataSet = InputDataSet[r,]

        library(dplyr)

        OutputDataSet <- InputDataSet %>% group_by(Id) %>% mutate(
                                                   Max_Col1 = max(Col1, na.rm = TRUE),
                                                   Max_Col2 = max(Col2, na.rm = TRUE),
                                                   Max_Col3 = max(Col3, na.rm = TRUE),) %>%  slice(1)
          '
    , @input_data_1 = N'SELECT * FROM table_name;'

This gives me the following error:

Msg 39004, Level 16, State 20, Line 26
A 'R' script error occurred during execution of 'sp_execute_external_script' with HRESULT 0x80004004.
Msg 39019, Level 16, State 1, Line 26
An external script error occurred: 

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Error in mutate_impl(.data, dots) : 
  attempt to use zero-length variable name
Calls: source ... mutate -> mutate_ -> mutate_.tbl_df -> mutate_impl -> .Call

Error in ScaleR.  Check the output for more information.
Error in eval(expr, envir, enclos) : 
  Error in ScaleR.  Check the output for more information.
Calls: source -> withVisible -> eval -> eval -> .Call
Execution halted

When I execute this same code on RStudio, it runs perfectly but gives error on the SQL Server. I do not understand what is this error about.

R Version on my SQL Server is: 3.2.2 (Fire Safety) packageVersion("dplyr") on the SQL Server: 0.4.3

r
sql-server
asked on Stack Overflow Jan 20, 2020 by heisenbug29

1 Answer

1

The issue would be based on the class of the column. If it is not a numeric, convert to numeric and it should work

OutputDataSet <- InputDataSet %>%
                    group_by(Id) %>%
                    mutate(
                       Max_Col1 = max(as.numeric(as.character(Col1)), na.rm = TRUE),
                       Max_Col2 = max(as.numeric(as.character(Col2)), na.rm = TRUE),
                       Max_Col3 = max(as.numeric(as.character(Col3)), na.rm = TRUE),) %>%  
              slice(1)

If we are using the newer versions of dplyr

 InputDataSet %>%
         type.convert(as.is = TRUE) %>% # should change the type 
          group_by(Id) %>%
          mutate_at(vars(starts_with("Col")), list(Max = ~ max(., na.rm = TRUE))) %>%
          slice(1)
answered on Stack Overflow Jan 20, 2020 by akrun • edited Jan 20, 2020 by akrun

User contributions licensed under CC BY-SA 3.0