I am trying a Regression challenge from Kaggle - the great energy predictor - https://www.kaggle.com/c/ashrae-energy-prediction
I have cleaned and preprocessed the data and now attempting to apply the XGBoost algorithm using the following data source (the variable I am predicting is meter_reading).
0 1 2 3 4
site_id 0 0 0 0 0
building_id 7 31 55 96 103
primary_use Education Education Office Lodging/residential Education
square_feet 121074 61904 16726 200933 21657
meter chilledwater chilledwater chilledwater chilledwater chilledwater
timestamp 2016-02-29 09:00:00 2016-02-29 09:00:00 2016-02-29 09:00:00 2016-02-29 09:00:00 2016-02-29 09:00:00
meter_reading 1857.26 1097.47 337.683 1266.31 337.683
meter_reading_roll_avg 2219.77 1719.04 510.663 2245.43 349.27
outlier_ratio 0.836691 0.638421 0.661264 0.563951 0.966825
air_temperature 12.8 12.8 12.8 12.8 12.8
dew_temperature 8.9 8.9 8.9 8.9 8.9
sea_level_pressure 1021.9 1021.9 1021.9 1021.9 1021.9
wind_speed 0 0 0 0 0
hour 9 9 9 9 9
weekday 0 0 0 0 0
month 2 2 2 2 2
wind_compass North North North North North
HDD 5.2 5.2 5.2 5.2 5.2
CDD 0 0 0 0 0
When I run it with around 10,000 samples, the algorithm works and I get a result.
When I run it with 400k+ samples I get an error
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-11-ff1b424a4002> in <module>
6 print("Test Index: ", test_index)
7 X_train, X_test, y_train, y_test = X.values[train_index], X.values[test_index], y.values[train_index], y.values[test_index]
----> 8 model.fit(X_train,y_train)
9 y_pred=model.predict(X_test)
10 predictions = [round(value) for value in y_pred]
~\AppData\Local\Continuum\anaconda3\lib\site-packages\xgboost\sklearn.py in fit(self, X, y, sample_weight, base_margin, eval_set, eval_metric, early_stopping_rounds, verbose, xgb_model, sample_weight_eval_set, callbacks)
822 evals_result=evals_result, obj=obj, feval=feval,
823 verbose_eval=verbose, xgb_model=xgb_model,
--> 824 callbacks=callbacks)
825
826 self.objective = xgb_options["objective"]
~\AppData\Local\Continuum\anaconda3\lib\site-packages\xgboost\training.py in train(params, dtrain, num_boost_round, evals, obj, feval, maximize, early_stopping_rounds, evals_result, verbose_eval, xgb_model, callbacks)
210 evals=evals,
211 obj=obj, feval=feval,
--> 212 xgb_model=xgb_model, callbacks=callbacks)
213
214
~\AppData\Local\Continuum\anaconda3\lib\site-packages\xgboost\training.py in _train_internal(params, dtrain, num_boost_round, evals, obj, feval, xgb_model, callbacks)
73 # Skip the first update if it is a recovery step.
74 if version % 2 == 0:
---> 75 bst.update(dtrain, i, obj)
76 bst.save_rabit_checkpoint()
77 version += 1
~\AppData\Local\Continuum\anaconda3\lib\site-packages\xgboost\core.py in update(self, dtrain, iteration, fobj)
1367 _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
1368 ctypes.c_int(iteration),
-> 1369 dtrain.handle))
1370 else:
1371 pred = self.predict(dtrain, output_margin=True, training=True)
OSError: [WinError -529697949] Windows Error 0xe06d7363
I think this is because I dont have enough computing power
Here are my specs.
Is there a quick and convenient way to determine when I don't have enough computing power for a dataset/algorithm?
User contributions licensed under CC BY-SA 3.0