Inserting large amounts of data into hive (around 64MB)

0

Ok, so I have a hive table on a remote hadoop node set up on a linux machine. I'm having an issue when attempting to insert a large json string, large as in possibly 64MB or more given that map reduce won't work well unless I approach that limit. I've successfully transfered over 8 - 9MB, but that's as high as it gets, if I attempt to do more than the query fails. I also had to override C#'s default json serializer to do this, not a good practice I know, but I really don't know any other way to do this.

Anyway this is how I store data into Hive:

namespace HadoopWebService.Controllers
{
    public class LogsController : Controller
    {
        // POST: HadoopRequest
        [HttpPost]
        public ContentResult Create(string json)
        {
            OdbcConnection hiveConnection = new OdbcConnection("DSN=Hadoop Server;UID=XXXX;PWD=XXXX");
            hiveConnection.Open();
            Stream req = Request.InputStream;
            req.Seek(0, SeekOrigin.Begin);
            string request = new StreamReader(req).ReadToEnd();
            ContentResult response;
            string query;

            try
            {
                query = "INSERT INTO TABLE error_log (json_error_log) VALUES('" + request + "')";
                OdbcCommand command = new OdbcCommand(query, hiveConnection);
                command.ExecuteNonQuery();
                command.CommandText = query;
                response = new ContentResult { Content = "{status: 1}", ContentType = "application/json" };
                hiveConnection.Close();
                return response;
            }
            catch(Exception error)
            {
                response = new ContentResult { Content = "{status: 0, message:" + error.ToString()+ "}" };
                System.Diagnostics.Debug.WriteLine(error.Message.ToString());
                hiveConnection.Close();
                return response;
            }
        }
    }
}

Is there some setting which I can use to insert larger amounts of data? I assume there must be some buffer that is failing to load everything. I've checked on google but I haven't found anything, mainly because this probably isn't the way to insert properly into Hadoop, but I'm really out of options right now, I can't use HDInsight, all I've got is the ODBC connection.

EDIT: This is the error I get:

System.Data.Odbc.OdbcException (0x80131937): ERROR [HY000][HiveODBC] (35) Error from Hive: error code: ‘0’ error message: ‘ExecuteStatement finished with operation state: ERROR_STATE’.

message:System.Data.Odbc.OdbcException (0x80131937): ERROR [HY000] [Microsoft][HiveODBC] (35) Error from Hive: error code: '0' error message: 'ExecuteStatement finished with operation state: ERROR_STATE'. at System.Data.Odbc.OdbcConnection.HandleError(OdbcHandle hrHandle, RetCode retcode) at System.Data.Odbc.OdbcCommand.ExecuteReaderObject(CommandBehavior behavior, String method, Boolean needReader, Object[] methodArguments, SQL_API odbcApiMethod) at System.Data.Odbc.OdbcCommand.ExecuteReaderObject(CommandBehavior behavior, String method, Boolean needReader) at System.Data.Odbc.OdbcCommand.ExecuteNonQuery()

hadoop
hive
odbc
asked on Stack Overflow Nov 9, 2015 by Argus • edited Nov 9, 2015 by Argus

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0