How to run a prediction (or scoring) in SQL Server integrated R using the sp_execute_external_script procedure?

2

I'm dealing with Credit Card Fraud data which resides in my SQL Server 2016 RTM Virtual Machine after working through the DeepDive Data Science tutorial on MSDN.
I now want to replicate this tutorial using the T-SQL integrated R and stored procedures. I'm able to run the linear and logistic regression models, print the results as messages, and create stored procedures for both. However, I am confused as to how the prediction should be scripted in R while using the sp_execute_external_script procedure.

This is what I have for the linear and logistic regression models.

Editing the Scripts to reflect the changes I've made after looking at comments/answers. Help taken from here and here

Summary Statistics of Fraud Data:

CREATE PROC summary_proc
AS
begin
exec sp_execute_external_script
    @language = N'R',
    @script = N'
                sumOut <- rxSummary(
                                    formula = ~gender + balance + numTrans + numIntlTrans + creditLine, 
                                    data = ccFraud
                                    )
                print(sumOut)
                OutputDataset <- data.frame(serialize(sumOut,NULL))
                ',
    @input_data_1 = N'select * from [DeepDive].[db_datareader].[ccFraudSmall]',
    @input_data_1_name = N'ccFraud',
    @output_data_1_name = N'OutputDataset'
    with result sets ((summary varbinary(max)));
END;

Linear Regression Model:

CREATE PROC linear_model
AS
begin
exec sp_execute_external_script
    @language = N'R',
    @script = N'
                    linModObj <- rxLinMod(
                                            balance ~ gender + creditLine,  
                                            data = ccFraud
                                            ) ;
                    print(linModObj)
                    OutputDataset <- data.frame(serialize(linModObj, NULL)); 
                ',
    @input_data_1 = N'select * from [DeepDive].[db_datareader].[ccFraud10]',
    @input_data_1_name = N'ccFraud',
    @output_data_1_name = N'OutputDataset'
    with result sets ((linear_model varbinary(max)));
END;

Logistic Regression Model:

create table logit_trained_model (
model varbinary (255)
);
CREATE PROC logit_model
AS
begin
insert into logit_trained_model
exec sp_execute_external_script
    @language = N'R',
    @script = N'
                    logitObj <- rxLogit(
                                        fraudRisk ~ state + gender + cardholder + balance + numTrans + numIntlTrans + creditLine, 
                                        data = ccFraud,
                                        dropFirst = TRUE
                                        );
                    print(logitObj)
                    OutputDataset <- data.frame(serialize(logitObj, NULL));  
                ',
    @input_data_1 = N'select * from [DeepDive].[db_datareader].[ccFraud10]',
    @input_data_1_name = N'ccFraud',
    @output_data_1_name = N'OutputDataset'
    --with result sets ((logit_model varbinary(max))); 
END;

I want to predict the scores based on the logit regression model.
Here's what I have till now:

Prediction / Scoring:

CREATE PROC prediction
AS
begin
DECLARE @lmodel2 varbinary(max) = (SELECT top 1 model  
                                        FROM logit_trained_model);
exec sp_execute_external_script
    @language = N'R',
    @script = N'
                    logit_model_obj <- unserialize(as.raw(model));
                    print(summary(logit_model_obj))
                    OutputDataset <- rxPredict(
                                            modelObject = logit_model_obj,   
                                            data = ccFraudScore,        
                                            outData = NULL,     
                                            predVarNames = "ccFraudLogitScore",   
                                            type = "link",      
                                            writeModelVars = TRUE,
                                            extraVarsToWrite = "custID",        
                                            overwrite = TRUE
                                            ) ;
                    str(OutputDataset)
                    print(OutputDataset)
                ',
    @input_data_1 = N'select * from [DeepDive].[db_datareader].[ccFraudScore10]',
    @input_data_1_name = N'ccFraudScore',
    @output_data_1_name = N'OutputDataset',
    @params = N'@model varbinary(max)',  
    @model = @lmodel2  
    WITH RESULT SETS ((Score float)); 

Previously, before editing the scripts, the error was object 'logitObj' not found. This was because I was referring to logitObj inside rxPredict when it was defined outside rxPredict.
I've made changes to my script to insert logitObj into a table, and call that table in rxPredict.
All the scripts above now, reflect that change. But here's the new error that I'm facing:

Msg 39004, Level 16, State 20, Line 76 A 'R' script error
occurred during execution of 'sp_execute_external_script' with HRESULT
0x80004004. Msg 39019, Level 16, State 1, Line 76 An external script
error occurred:  Error in unserialize(as.raw(model)) : read error
Calls: source -> withVisible -> eval -> eval -> unserialize

Error in ScaleR.  Check the output for more information. Error in
eval(expr, envir, enclos) :    Error in ScaleR.  Check the output for
more information. Calls: source -> withVisible -> eval -> eval ->
.Call Execution halted Msg 11536, Level 16, State 1, Line 78 EXECUTE
statement failed because its WITH RESULT SETS clause specified 1
result set(s), but the statement only sent 0 result set(s) at run
time. 


From what I understand, R is not able to read the variable @model. Just to check, I ran the query [SELECT top 1 model FROM logit_trained_model] for the variable @lmodel2 to see if it is bringing back anything. Apparently, it isn't. The table is just one column named model with no data in it.

How do I get to this?

r
stored-procedures
enterprise
sql-server-2016
asked on Stack Overflow Jun 24, 2016 by Raj • edited Jun 25, 2016 by Minu

1 Answer

1

You can return the trained model in serialized format using output_data_1 or output parameters and store in database table. Then pass the model back to the prediction script as an input parameter.

Refer to In-Database Advanced Analytics for SQL Developers tutorials, specifically the steps 5. Train and Save a Model using T-SQL and 6. Operationalize the Model

answered on Stack Overflow Jun 25, 2016 by Arun Gurunathan

User contributions licensed under CC BY-SA 3.0