I'm working on a predictive model for in database scoring.. The model will reside in SQL server and either sp_execute_external_script or SQL native PREDICT predicate will be use to run the model against the data in the database.
I can train a model on the data in the database, store it in a table in R and SQL compatible serialization formats and use it without any problem.
The real model will be based on customer data and the data will come as a flat file. I will be using VS and R Tools to develop a model.
Once it's done, I want to serialize the model and push it to the database and use it there..
I tried to do that using rxWriteObject and I get the binaries in my destination table just fine. However when I try to run the model I get errors.
This is one is when I use PREDICT and i think the model is just not serialized/deserialized correct way to be compatible with SQL PREDICT:
Error occurred during execution of the builtin function 'PREDICT' with HRESULT 0x80070057. Model is corrupt or invalid
For more native R execution using sp_execute_external_script I get this:
Msg 39004, Level 16, State 20, Line 1 A 'R' script error occurred during execution of 'sp_execute_external_script' with HRESULT 0x80004004. Msg 39019, Level 16, State 2, Line 1 An external script error occurred: During startup - Warning message: In setJsonDatabasePath(system.file("extdata/capabilities.json", : bytecode version mismatch; using eval Error in UseMethod("predict") : no applicable method for 'predict' applied to an object of class "rxLogit" Calls: source -> withVisible -> eval -> eval -> predict
Error in execution. Check the output for more information. Error in eval(expr, envir, enclos) : Error in execution. Check the output for more information. Calls: source -> withVisible -> eval -> eval -> .Call Execution halted
I switched compression off from gzip when I serialized the model and push it to SQL using rxWriteObject, and while the binary looked better in the table and closer to what i had there that was working, it still gave me the same errror.
I think the problem might be in input data metadata. I think there slight data forma differences between training data in the SQL and one in the flat file. While conceptually they are the same, there could be subtle differences that does not allow me to use the model trained on flat file data to do scoring of data in the SQL.
Even if i replicate R code for training and use sp_execute_external_script within SQL to read and use the file data, i get the same problem.
I guess loading training data from a file into a table that closely resembles the column types of data that the model will be used against might work, but that seems like a lot of pain..
Anything obvious that I'm missing in my approach?
User contributions licensed under CC BY-SA 3.0