I use Pycharm to run my script. I have a script that loops. Each loop: 1. Select a dataset. 2. Trains a new Keras model. 3. Evaluate that model.
So the code works perfectly for 2 weeks but when installing a new anaconda environment, the code suddenly fails after two iteration of that loop.
Two models of Siamese Neural Network will train perfectly fine and right before the third loop, it crashes with Process finished with exit code -1073741819 (0xC0000005).
1/32 [..............................] - ETA: 0s - loss: 0.5075
12/32 [==========>...................] - ETA: 0s - loss: 0.5112
27/32 [========================>.....] - ETA: 0s - loss: 0.4700
32/32 [==============================] - 0s 4ms/step - loss: 0.4805
eval run time : 0.046851396560668945
For LOOCV run 2 out of 32. Model is SNN. Time taken for instance = 6.077638149261475
Post-training results:
acc = 1.0 , ce = 0.6019332906978302 , f1 score = 1.0 , mcc = 0.0
cm =
[[1]]
####################################################################################################
Process finished with exit code -1073741819 (0xC0000005)
The strange thing is that the code used to work perfectly fine and even when I am not using the anaconda enviornment and used the previous environment I used, it still exits with the same exit code.
When I use a another type of model (a dense neural network), it also crashes but after 4 iteration. Is it something to do with running out of memory? This is an example of the loop. The exact model does not matter, it always crashes after a certain number of loops at the train model line (Between point 2 and 3)
# Run k model instance to perform skf
predicted_labels_store = []
acc_store = []
ce_store = []
f1s_store = []
mcc_store = []
folds = []
val_features_c = []
val_labels = []
for fold, fl_tuple in enumerate(fl_store):
instance_start = time.time()
(ss_fl, i_ss_fl) = fl_tuple # ss_fl is training fl, i_ss_fl is validation fl
if model_mode == 'SNN':
# Run SNN
model = SNN(hparams, ss_fl.features_c_dim)
loader = Siamese_loader(model.siamese_net, ss_fl, hparams)
loader.train(loader.hparams.get('epochs', 100), loader.hparams.get('batch_size', 32),
verbose=loader.hparams.get('verbose', 1))
predicted_labels, acc, ce, cm, f1s, mcc = loader.eval(i_ss_fl)
predicted_labels_store.extend(predicted_labels)
acc_store.append(acc)
ce_store.append(ce)
f1s_store.append(f1s)
mcc_store.append(mcc)
elif model_mode == 'cDNN':
# Run DNN
print('Point 1')
model = DNN_classifer(hparams, ss_fl)
print('Point 2')
model.train_model(ss_fl)
print('Point 3')
predicted_labels, acc, ce, cm, f1s, mcc = model.eval(i_ss_fl)
predicted_labels_store.extend(predicted_labels)
acc_store.append(acc)
ce_store.append(ce)
f1s_store.append(f1s)
mcc_store.append(mcc)
del model
K.clear_session()
instance_end = time.time()
if cv_mode == 'skf':
print('\nFor k-fold run {} out of {}. Model is {}. Time taken for instance = {}\n'
'Post-training results: \nacc = {} , ce = {} , f1 score = {} , mcc = {}\ncm = \n{}\n'
'####################################################################################################'
.format(fold + 1, k_folds, model_mode, instance_end - instance_start, acc, ce, f1s, mcc, cm))
else:
print('\nFor LOOCV run {} out of {}. Model is {}. Time taken for instance = {}\n'
'Post-training results: \nacc = {} , ce = {} , f1 score = {} , mcc = {}\ncm = \n{}\n'
'####################################################################################################'
.format(fold + 1, fl.count, model_mode, instance_end - instance_start, acc, ce, f1s, mcc, cm))
# Preparing output dataframe that consists of all the validation dataset and its predicted labels
folds.extend([fold] * i_ss_fl.count) # Make a col that contains the fold number for each example
val_features_c = np.concatenate((val_features_c, i_ss_fl.features_c_a),
axis=0) if val_features_c != [] else i_ss_fl.features_c_a
val_labels.extend(i_ss_fl.labels)
K.clear_session()
And the exit code for a dense neural network.
For LOOCV run 4 out of 32. Model is cDNN. Time taken for instance = 0.7919328212738037
Post-training results:
acc = 0.0 , ce = 0.7419472336769104 , f1 score = 0.0 , mcc = 0.0
cm =
[[0 1]
[0 0]]
####################################################################################################
Point 1
Point 2
Process finished with exit code -1073741819 (0xC0000005)
Any help is greatly appreciated thank you!.
Below is the explanation for the things I suggested in the comments that worked, in case anyone faces the same issue.
Manually setting session for keras rather than using the default one at the start of each loop.
sess = tf.Session()
K.set_session(sess)
#..... train your model
K.clear_session()
Deleting loader
variable as this object must be having reference to the original model
object as I can see you are calling the train()
on it.
Explicitly collecting all the memory released by deleting these the variable using gc.collect()
after each loop so that we have enough memory for building our new model.
So, the gist is when running multiple independent model in a loop like this make sure you have explicitly set the tensorflow session so that you can clear this session after loop finishes, releasing all the resources uses by this session. Delete all the references that might be tied to tensorflow objects in that loop and collect the free memory.
User contributions licensed under CC BY-SA 3.0