I'm dealing with Credit Card Fraud data which resides in my SQL Server 2016 RTM Virtual Machine after working through the DeepDive Data Science tutorial on MSDN.
I now want to replicate this tutorial using the T-SQL integrated R and stored procedures. I'm able to run the linear and logistic regression models, print the results as messages, and create stored procedures for both. However, I am confused as to how the prediction should be scripted in R while using the sp_execute_external_script
procedure.
This is what I have for the linear and logistic regression models.
Editing the Scripts to reflect the changes I've made after looking at comments/answers. Help taken from here and here
Summary Statistics of Fraud Data:
CREATE PROC summary_proc
AS
begin
exec sp_execute_external_script
@language = N'R',
@script = N'
sumOut <- rxSummary(
formula = ~gender + balance + numTrans + numIntlTrans + creditLine,
data = ccFraud
)
print(sumOut)
OutputDataset <- data.frame(serialize(sumOut,NULL))
',
@input_data_1 = N'select * from [DeepDive].[db_datareader].[ccFraudSmall]',
@input_data_1_name = N'ccFraud',
@output_data_1_name = N'OutputDataset'
with result sets ((summary varbinary(max)));
END;
Linear Regression Model:
CREATE PROC linear_model
AS
begin
exec sp_execute_external_script
@language = N'R',
@script = N'
linModObj <- rxLinMod(
balance ~ gender + creditLine,
data = ccFraud
) ;
print(linModObj)
OutputDataset <- data.frame(serialize(linModObj, NULL));
',
@input_data_1 = N'select * from [DeepDive].[db_datareader].[ccFraud10]',
@input_data_1_name = N'ccFraud',
@output_data_1_name = N'OutputDataset'
with result sets ((linear_model varbinary(max)));
END;
Logistic Regression Model:
create table logit_trained_model (
model varbinary (255)
);
CREATE PROC logit_model
AS
begin
insert into logit_trained_model
exec sp_execute_external_script
@language = N'R',
@script = N'
logitObj <- rxLogit(
fraudRisk ~ state + gender + cardholder + balance + numTrans + numIntlTrans + creditLine,
data = ccFraud,
dropFirst = TRUE
);
print(logitObj)
OutputDataset <- data.frame(serialize(logitObj, NULL));
',
@input_data_1 = N'select * from [DeepDive].[db_datareader].[ccFraud10]',
@input_data_1_name = N'ccFraud',
@output_data_1_name = N'OutputDataset'
--with result sets ((logit_model varbinary(max)));
END;
I want to predict the scores based on the logit regression model.
Here's what I have till now:
Prediction / Scoring:
CREATE PROC prediction
AS
begin
DECLARE @lmodel2 varbinary(max) = (SELECT top 1 model
FROM logit_trained_model);
exec sp_execute_external_script
@language = N'R',
@script = N'
logit_model_obj <- unserialize(as.raw(model));
print(summary(logit_model_obj))
OutputDataset <- rxPredict(
modelObject = logit_model_obj,
data = ccFraudScore,
outData = NULL,
predVarNames = "ccFraudLogitScore",
type = "link",
writeModelVars = TRUE,
extraVarsToWrite = "custID",
overwrite = TRUE
) ;
str(OutputDataset)
print(OutputDataset)
',
@input_data_1 = N'select * from [DeepDive].[db_datareader].[ccFraudScore10]',
@input_data_1_name = N'ccFraudScore',
@output_data_1_name = N'OutputDataset',
@params = N'@model varbinary(max)',
@model = @lmodel2
WITH RESULT SETS ((Score float));
Previously, before editing the scripts, the error was object 'logitObj' not found. This was because I was referring to logitObj
inside rxPredict
when it was defined outside rxPredict
.
I've made changes to my script to insert logitObj
into a table, and call that table in rxPredict
.
All the scripts above now, reflect that change. But here's the new error that I'm facing:
Msg 39004, Level 16, State 20, Line 76 A 'R' script error occurred during execution of 'sp_execute_external_script' with HRESULT 0x80004004. Msg 39019, Level 16, State 1, Line 76 An external script error occurred: Error in unserialize(as.raw(model)) : read error Calls: source -> withVisible -> eval -> eval -> unserialize Error in ScaleR. Check the output for more information. Error in eval(expr, envir, enclos) : Error in ScaleR. Check the output for more information. Calls: source -> withVisible -> eval -> eval -> .Call Execution halted Msg 11536, Level 16, State 1, Line 78 EXECUTE statement failed because its WITH RESULT SETS clause specified 1 result set(s), but the statement only sent 0 result set(s) at run time.
From what I understand, R is not able to read the variable @model
. Just to check, I ran the query [SELECT top 1 model FROM logit_trained_model]
for the variable @lmodel2
to see if it is bringing back anything. Apparently, it isn't. The table is just one column named model with no data in it.
How do I get to this?
You can return the trained model in serialized format using output_data_1 or output parameters and store in database table. Then pass the model back to the prediction script as an input parameter.
Refer to In-Database Advanced Analytics for SQL Developers tutorials, specifically the steps 5. Train and Save a Model using T-SQL and 6. Operationalize the Model
User contributions licensed under CC BY-SA 3.0