I am trying to convert python dataframe data types so that they can be returned through sql server using the sp_execute_external_scripts procedure. Some columns in particular are giving me issues. Sample data:
>>> df.column1
0 NaN
1 1403
2 NaN
3 NaN
4 NaN
Using the method found in another answer (https://stackoverflow.com/a/60779074/3084939) I created a function to do this and return a new series.
def str_convert(series):
null_cells = series.isnull()
return series.astype(str).mask(null_cells, np.NaN)
I then do:
df.column1 = str_convert(df.column1)
When I run the procedure in management studio I get an error:
Msg 39004, Level 16, State 20, Line 0
A 'Python' script error occurred during execution of 'sp_execute_external_script' with HRESULT 0x80004004.
Msg 39019, Level 16, State 2, Line 0
An external script error occurred:
C:\SQL\MSSQL14.SQL2017\PYTHON_SERVICES.3.7\lib\site-packages\revoscalepy\functions\RxSummary.py:4: FutureWarning: The Panel class is removed from pandas. Accessing it from the top-level namespace will also be removed in the next version
from pandas import DataFrame, Index, Panel
INTERNAL ERROR: should have tag
error while running BxlServer: caught exception: Error communicating between BxlServer and client: 0x000000e9
STDOUT message(s) from external script:
Express Edition will continue to be enforced.
Warning: numpy.int64 data type is not supported. Data is converted to float64.
Warning: numpy.int64 data type is not supported. Data is converted to float64.
SqlSatelliteCall function failed. Please see the console output for more information.
Traceback (most recent call last):
STDOUT message(s) from external script:
File "C:\SQL\MSSQL14.SQL2017\PYTHON_SERVICES.3.7\lib\site-packages\revoscalepy\computecontext\RxInSqlServer.py", line 605, in rx_sql_satellite_call
rx_native_call("SqlSatelliteCall", params)
File "C:\SQL\MSSQL14.SQL2017\PYTHON_SERVICES.3.7\lib\site-packages\revoscalepy\RxSerializable.py", line 375, in rx_native_call
ret = px_call(functionname, params)
RuntimeError: The type numpy.ndarray(numpy.ustr) for column1 is not supported.
No idea on where to start but when I simply do the below, it does not error but NaN values get replaced with 'nan' therefore return as the literal string rather than null in sql server which is not what I want. Hopefully someone else has some insight as to what's going on. I tried searching and nothing relevant came up.
df.column1 = df.column1 .astype(str)
Edit:
A more trivial example seems to reveal this occurring when the first value in the series is NaN.
declare @script nvarchar(max) = N'
import os
import datetime
import numpy as np
import pandas as pd
df = pd.DataFrame([[np.NaN, "a", "b"],["w","x",np.NaN],[1, 2, 3]])
df.columns = ["a","b","c"]
print(df.head())
'
execute sp_execute_external_script
@language = N'Python',
@script = @script,
@output_data_1_name = N'df'
with result sets ((
a varchar(100) null
,b varchar(100) null
,c varchar(100) null
))
I believe that this might be due to the fact that np.NaN is not a string and therefore cannot be converted to varchar.
Try casting the df values to str, i.e. in your latest example:
df["a"] = df["a"].apply(str)
User contributions licensed under CC BY-SA 3.0