error trying to use GridSearhCV with XGBClassifier

1

I am trying to use GridSearchCV with XGBoost in order to train and fit using the best prams quite a large dataset (> 500MB). But I'm getting an error which I am unable to resolve. I have a laptop with 32GB ram and the Python memory consumption did not exceed 10gb once I left it running so I don't think it is a memory-related issue although I am not sure.

I do not even know if can get through these kinds of errors by using a different model or something else. Maybe it is a problem with grid search but not sure how to get over this without using it.

Here is the code:

parameters = {'learning_rate': [0.03,0.04,0.05,0.06,0.07],
              'max_depth': [3,5,7,9,11],
              'min_child_weight': [2,4,6,8,10],
              'subsample': [0.9,0.8,0.7,0.6,0.5],
              'colsample_bytree': [0.9,0.8,0.7,0.6,0.5]}

xgb_model = XGBClassifier(objective = 'multi:softprob', nthread = 4, n_estimators= 1000, seed = 1337, silent = 0)

clf = GridSearchCV(estimator=xgb_model,param_grid=parameters,n_jobs=-1,verbose = 5)
clf.fit(orders_prior1[['user_id','order_number','order_dow','order_hour_of_day','days_since_prior_order']]\
                       ,orders_prior1['product_id'], orders_prior1['user_order'])

Here is the Error:

Traceback (most recent call last):
  File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py", line 344, in __call__
    return self.func(*args, **kwargs)
  File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 131, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 131, in <listcomp>
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "C:\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 238, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Anaconda3\lib\site-packages\xgboost\sklearn.py", line 445, in fit
    verbose_eval=verbose)
  File "C:\Anaconda3\lib\site-packages\xgboost\training.py", line 205, in train
    xgb_model=xgb_model, callbacks=callbacks)
  File "C:\Anaconda3\lib\site-packages\xgboost\training.py", line 76, in _train_internal
    bst.update(dtrain, i, obj)
  File "C:\Anaconda3\lib\site-packages\xgboost\core.py", line 806, in update
    _check_call(_LIB.XGBoosterUpdateOneIter(self.handle, iteration, dtrain.handle))
OSError: [WinError -529697949] Windows Error 0xe06d7363

I posted quite a similar question to this (except using a neural network classifier) a few days ago but still, did not get any positive answer. Any ideas on why this kind of error happens and how to resolve this kind of error? Maybe try a different language? Although Python became a key language for dealing with data-science problems, I am up to using a different language to get over this problem. Thanks in advance.

python
classification
xgboost
grid-search
asked on Stack Overflow Aug 20, 2017 by mj1261829

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0