Poor exit code when managing multiple sessions

Question

Poor exit code when managing multiple sessions

I'm having an weird issue when training models using tf.Graph and tf.Session. And the implementation is somewhat odd, so bear with me. I'd like to explain the application structure.

The issue has been finally (and somewhat embarassingly) resolved by updating all packages.

Application

The application is a service for handling multiple neural networks: training them and making predictions on them. For this reason a single graph wasn't quite enough. So when creating a new model, I firstly initialise both a Graph and a Session like so:

def __init__(self):
    self.graph = tf.Graph()
    with self.graph.as_default():
        self.session = tf.Session()

These are then used both in the training process and when loading a model from disk.

def fit(self, x_train, y_train, n=200, batch=256):
    with self.graph.as_default():
        with self.session.as_default():
            self.model.fit(x_train, y_train, epochs=n, batch_size=batch, verbose=0)

This is where the problem occurs (I've managed to comment everything out one by one, and the fit method is where it's at), but for further context, here is the (stripped down) creation method as well. It uses Keras.

def create(self):
    with self.graph.as_default():
        with self.session.as_default():
            self.model = Sequential()
            self.model.add(Dense(64, input_dim=shape[0], activation='relu',
                                 kernel_regularizer=reg.l1_l2(0.1, 0.2)))
            self.model.add(Dropout(0.5))
            self.model.add(Dense(1, activation='sigmoid', kernel_regularizer=reg.l1_l2(0.1, 0.2)))
            self.model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

Problem

When initialising the network and fitting it with data, the process exits with a bad code: 0xC0000005. This doesn't give much information on the problem itself, and the bad exit code is given just then, on exit. Even a print statement is executed after the routines successfully. This has led me to suspect it is not a problem with the implementation, but something else.

Environment

I'm using Python 3.6.5 on PyCharm, but the problem has occured even when executing from a command line. As I said, multiple models are juggled around, but one training is enough to crash.

What could possibly be at fault here? I realise it's not such a reproducable problem, but any pointers towards even debugging would be greatly appreciated.

Adventures

I tried modifying the fit function according to this answer, but with no luck. Here's the modified version:

from keras import backend as K
import gc

def fit(self, x_train, y_train, n=20, batch=256):
    K.set_session(self.session)
    with self.graph.as_default():
        with self.session.as_default():
            self.model.fit(x_train, y_train, epochs=n*10, batch_size=batch, verbose=0)
    K.clear_session()
    gc.collect()

Next I tried to create a new session for each computation (tf.Session(graph=self.graph)). It worked when using the cg.collect(), but after training the model, I could not make predictions with a new session. tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value dense_1/bias

Update

Currently (Nov 8th) I'm releasing all possible resources before creating and when loading a model. This has had the effect that I can create the model once, but the second time around (I do two training passes to evaluate the model independently) the program crashes like before. Let's try a new question, this is getting out of hand. Q v.2

python

tensorflow

keras

asked on Stack Overflow Oct 26, 2018 by

Felix • edited Nov 15, 2018 by

Felix

0 Answers

Nobody has answered this question yet.

User contributions licensed under CC BY-SA 3.0