I am trying to read data from delta format tables using c# via simba odbc driver. delta format table sample : https://docs.delta.io/latest/quick-start.html#-create-a-table&language-python
Have downloaded and configured simba odbc as instructed in https://www.simba.com/products/Spark/doc/ODBC_InstallGuide/mac/content/odbc/configuring/drivermanager.htm
I am able to successfully connect to the spark thrift server after this configuration. However, I am unable to read data from the delta format tables.
using(dbConnection)
{
dbConnection.Open();
Console.WriteLine("Connection Open!");
OdbcCommand dbCommand = dbConnection.CreateCommand();
dbCommand.CommandText = "SELECT * FROM accnt LIMIT 10";
OdbcDataReader dbReader = dbCommand.ExecuteReader();
if(dbReader.HasRows)
{
while(dbReader.Read())
{
Console.WriteLine("{0}\t{1}\t{2}",
dbReader.GetString(0),dbReader.GetString(1), dbReader.GetString(2));
dbReader.GetString(1));
}
}
else
{
Console.WriteLine("No Rows Found.");
}
Console.WriteLine("Connection Close !");
}
Error Messages is :
Unhandled exception. System.Data.Odbc.OdbcException (0x80131937): ERROR [HY000] [Simba][Hardy] (35) Error from server: error code: '0' error message: 'Error running query: java.lang.ClassNotFoundException: DELTA.DefaultSource'.
at System.Data.Odbc.OdbcConnection.HandleError(OdbcHandle hrHandle, RetCode retcode)
at System.Data.Odbc.OdbcCommand.ExecuteReaderObject(CommandBehavior behavior, String method, Boolean needReader, Object[] methodArguments, SQL_API odbcApiMethod)
at System.Data.Odbc.OdbcCommand.ExecuteReaderObject(CommandBehavior behavior, String method, Boolean needReader)
at System.Data.Odbc.OdbcCommand.ExecuteReader(CommandBehavior behavior)
at System.Data.Odbc.OdbcCommand.ExecuteReader()
at cdp_deltalake_poc.Program.Main(String[] args)
You need to add configuration options and Delta Lake package when you're starting the Thrift Server. This is done the same way as submitting applications to Spark. For example:
sbin/start-thriftserver.sh \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \
--packages 'io.delta:delta-core_2.12:0.8.0'
You need to adjust version of the Delta Lake package depending on the Spark version (I did use 0.8.0 with Spark 3.0.1).
After that I can query the data that I've created.
User contributions licensed under CC BY-SA 3.0