I am trying to use GridSearchCV with XGBoost in order to train and fit using the best prams quite a large dataset (> 500MB). But I'm getting an error which I am unable to resolve. I have a laptop with 32GB ram and the Python memory consumption did not exceed 10gb once I left it running so I don't think it is a memory-related issue although I am not sure.
I do not even know if can get through these kinds of errors by using a different model or something else. Maybe it is a problem with grid search but not sure how to get over this without using it.
Here is the code:
parameters = {'learning_rate': [0.03,0.04,0.05,0.06,0.07],
'max_depth': [3,5,7,9,11],
'min_child_weight': [2,4,6,8,10],
'subsample': [0.9,0.8,0.7,0.6,0.5],
'colsample_bytree': [0.9,0.8,0.7,0.6,0.5]}
xgb_model = XGBClassifier(objective = 'multi:softprob', nthread = 4, n_estimators= 1000, seed = 1337, silent = 0)
clf = GridSearchCV(estimator=xgb_model,param_grid=parameters,n_jobs=-1,verbose = 5)
clf.fit(orders_prior1[['user_id','order_number','order_dow','order_hour_of_day','days_since_prior_order']]\
,orders_prior1['product_id'], orders_prior1['user_order'])
Here is the Error:
Traceback (most recent call last):
File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py", line 344, in __call__
return self.func(*args, **kwargs)
File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 131, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 131, in <listcomp>
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "C:\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 238, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\Anaconda3\lib\site-packages\xgboost\sklearn.py", line 445, in fit
verbose_eval=verbose)
File "C:\Anaconda3\lib\site-packages\xgboost\training.py", line 205, in train
xgb_model=xgb_model, callbacks=callbacks)
File "C:\Anaconda3\lib\site-packages\xgboost\training.py", line 76, in _train_internal
bst.update(dtrain, i, obj)
File "C:\Anaconda3\lib\site-packages\xgboost\core.py", line 806, in update
_check_call(_LIB.XGBoosterUpdateOneIter(self.handle, iteration, dtrain.handle))
OSError: [WinError -529697949] Windows Error 0xe06d7363
I posted quite a similar question to this (except using a neural network classifier) a few days ago but still, did not get any positive answer. Any ideas on why this kind of error happens and how to resolve this kind of error? Maybe try a different language? Although Python became a key language for dealing with data-science problems, I am up to using a different language to get over this problem. Thanks in advance.
User contributions licensed under CC BY-SA 3.0