Part 3: Switching between multiple contexts - no error and a bad exit code

1

I've been struggling with managing multiple Keras models with tf.Graphs and tf.Sessions for several weeks now. In short, I'd like to have multiple models open and switch between them as needed. This includes training new models, opening from file and making predictions.

The bottom line is: (almost) everything works fine until the program crashes with exit code 0xC0000005. No error messages are given. Let me explain.

  • I can load a model and make predictions on it. Results are recieved, and after printing them, the program crashes.
  • I can load multiple models and make predictions on them. Then the program crashes.
  • I can create a new model, and make predictions on it. Finally, the program crashes.
  • I cannot create two models, even the same model twice with different instances of the class below. The program crashes.

You get the point. This is how I currently manage the graphs and sessions. I use a context manager to set the created graph and session as defaults and later switch to the previous state.

class NeuralNetwork:
    def __init__(self):
        self.graph = tf.Graph()
        self.session = tf.Session(graph=self.graph)
        self.model = None

    def close(self):
        self.session.close()
        del self.graph
        self.graph = None
        gc.collect()

    @contextmanager
    def _context(self):
        prev = k.get_session()
        k.set_session(self.session)
        with self.graph.as_default(), self.session.as_default():
            yield
        k.set_session(prev)

    def predict(self, x):
        with self._context():
            return self.model.predict(x)

    def fit(self, x_train, y_train, n=20, batch=256):
        with self._context():
            self.model.fit(x_train, y_train, epochs=n, batch_size=batch, verbose=0)

    def create(self, shape):
        with self._context():
            self.model = Sequential()
            self.model.add(Dense(shape[1], input_dim=shape[0], activation='relu'))
            self.model.add(Dropout(drop))
            self.model.add(Dense(shape[2], activation='sigmoid'))
            self.model.compile(loss='binary_crossentropy', optimizer='rmsprop')

    def load(self, path, sfx=''):
        with open(path / ('architecture' + sfx + '.json'), 'r') as f:
            js = f.read()

        with self._context():
            self.model = model_from_json(js)
            self.model.load_weights(path / ('weights' + sfx + '.h5'))
            self.model.compile(loss='binary_crossentropy', optimizer='rmsprop')

    def save(self, path, sfx=''):
        path.mkdir(exist_ok=True)
        with self._context():
            js = self.model.to_json()
            with open(path / ('architecture' + sfx + '.json'), 'w') as f:
                f.write(js)
            self.model.save_weights(path / ('weights' + sfx + '.h5'))

And with the above class, here's how a network is used elsewhere:

def create(self):
    x, y = [], []
    shape = (15, 30, 1)

    self.predictor = NeuralNetwork()
    self.predictor.create(shape)
    self.predictor.fit(x, y)
    self.predictor.save(path=self.path)
    self.predictor.close()

def load(self):
    self.predictor.load(path=self.path)

def predict(x):
    # Executed only on loaded networks, never on created networks
    # due to program structure
    return self.predictor.predict(x)

Here are my previous efforts at articulating the problem.

  • Part 1, the one where I had no clue
  • Part 2, the one where I started to figure things out

To the best of my abilities and with the help of some people, I've tried to come up with a way to manage these resources (context manager and "closing" the network after training). But I have not come across documentation or a tutorial describing the process of Tensorflow or Keras resource management in detail.


My goals are two-fold.

  • First and foremost, get rid of this error
  • Hopefully learn the absolutely correct way of dealing with this scenario

If you can help me achieve or even step a tiny amount towards the direction of either one, I'd greatly appreciate it! I have the experience, that my struggles are neither unique nor something that others haven't already thought of. So I must just be lacking the proper approach.

python
tensorflow
keras
asked on Stack Overflow Nov 15, 2018 by Felix • edited Nov 15, 2018 by Felix

1 Answer

1

The issue was resolved by updating all packages to their latest versions. Sadly, I made the upgrading in one go, which means I'm not sure what actually was the cause. But I'm willing to bet on Tensorflow.

Here are the package versions most likely involved in producing the error and their updated versions:

  • tensorflow==1.8.0 -> 1.12.0
  • numpy==1.14.5 -> 1.15.4
  • scikit-learn==0.19.1 -> 0.20.0
answered on Stack Overflow Nov 15, 2018 by Felix

User contributions licensed under CC BY-SA 3.0