R-Integration to SQL Server 2016 (CTP3): I am using the new sp_execute_external_script to create a linear regression model. Is there a way to send the coefficients data about the trained model to output_data_1_name ?
For example, in the body of the R-Script, if you issue: print(myModel); It prints this in the SSMS output window (not output_data):
Call:
lm(formula = DepVar ~ IndepVar1 + IndepVar2, data = myDemoData)
Coefficients:
(Intercept) IndepVar1 IndepVar2
123.456 25.456 56.382
Is it possible to get this into a data frame? That would be preferable, as I also want to get the t-values and R-squared and store it to a table. Even a varchar(max) would be fine. I'd just parse it myself.
Here is what I've tried most recently:
declare @rx_model varbinary(max) = (select model from dbo.Mymodel)
exec dbo.sp_execute_external_script
@language = N'R',
@script = N'require("RevoScaleR");
Mymodel <- unserialize(rx_model);
Mymodelsummary = summary(Mymodel);
A1 = Mymodelsummary[1];
A2 = Mymodelsummary[2];
A3 = Mymodelsummary[3];
A4 = Mymodelsummary[4];
A5 = Mymodelsummary[5];
summary_Text = data.frame( c(A4, A5) ); ',
@input_data_1 = N'',
@input_data_1_name = N'',
@output_data_1_name = N'summary_Text',
@params = N'@rx_model varbinary(max)',
@rx_model = @rx_model
with result sets (("A4" nvarchar(max), "A5" nvarchar(max) ));
The error I'm getting in SQL Server 2016 CTP3 is:
Msg 39004, Level 16, State 20, Line 0 A 'R' script error occurred during execution of 'sp_execute_external_script' with HRESULT 0x80004004. Msg 39019, Level 16, State 1, Line 0 An external script error occurred: Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) : cannot coerce class ""summary.rxLinMod"" to a data.frame Calls: source ... data.frame -> as.data.frame -> as.data.frame.default Error in ScaleR. Check the output for more information. Error in eval(expr, envir, enclos) : Error in ScaleR. Check the output for more information. Calls: source -> withVisible -> eval -> eval -> .Call Execution halted Msg 11536, Level 16, State 1, Line 2 EXECUTE statement failed because its WITH RESULT SETS clause specified 1 result set(s), but the statement only sent 0 result set(s) at run time.
So I'm wondering how to get that output out of sp_execute_external_script in SQL. MSDN does not cover much about R itself. SQL is complaining that the output from the model cannot be "coerced" to a data frame. I'm wondering what manipulation in the R-Script can be done to "tease" it into a dataframe.
If you want to get a nvarchar
, you can try something like:
EXEC sp_execute_external_script
@language = N'R'
, @script = N'
mymodel <- lm(formula = DepVar ~ IndepVar1 + IndepVar2, data = myDemoData);
coefficients <- paste(names(mymodel$coefficients), mymodel$coefficients, sep="=", collapse = " ");
'
, @input_data_1 = N'select DepVar, IndepVar1, IndepVar2 from myDemoData'
, @input_data_1_name = N'myDemoData'
, @output_data_1_name = N'coefficients'
WITH RESULT SETS (( coefficients nvarchar(max)));
this should return the string
"(Intercept)=123.456 IndepVar1=25.456 IndepVar2=56.382"
You don't need to convert the model into data frame per se to return to SQL. If you want to store the entire model in the database, then you can transform to raw vector in R and return it as a a varbinary(max) output parameter in T-SQL. Alternatively, you can extract the individual components from the model like coefficients, error as individual / scalar values and return to SQL.
Here is an example that extracts coefficients as a data frame:
execute sp_execute_external_script
@language = N'R'
, @script = N'
irisModel <- lm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species, data = iris);
irisCoeff <- summary(irisModel)$coefficients;
OutputDataSet <- cbind(name = row.names(irisCoeff), data.frame(irisCoeff));
'
with result sets((Name nvarchar(100), "Estimate" float, "Std.Error" float, "t.value" float, "Pr.value" float))
User contributions licensed under CC BY-SA 3.0