I'm having an weird issue when training models using tf.Graph
and tf.Session
. And the implementation is somewhat odd, so bear with me. I'd like to explain the application structure.
The issue has been finally (and somewhat embarassingly) resolved by updating all packages.
The application is a service for handling multiple neural networks: training them and making predictions on them. For this reason a single graph wasn't quite enough. So when creating a new model, I firstly initialise both a Graph
and a Session
like so:
def __init__(self):
self.graph = tf.Graph()
with self.graph.as_default():
self.session = tf.Session()
These are then used both in the training process and when loading a model from disk.
def fit(self, x_train, y_train, n=200, batch=256):
with self.graph.as_default():
with self.session.as_default():
self.model.fit(x_train, y_train, epochs=n, batch_size=batch, verbose=0)
This is where the problem occurs (I've managed to comment everything out one by one, and the fit method is where it's at), but for further context, here is the (stripped down) creation method as well. It uses Keras.
def create(self):
with self.graph.as_default():
with self.session.as_default():
self.model = Sequential()
self.model.add(Dense(64, input_dim=shape[0], activation='relu',
kernel_regularizer=reg.l1_l2(0.1, 0.2)))
self.model.add(Dropout(0.5))
self.model.add(Dense(1, activation='sigmoid', kernel_regularizer=reg.l1_l2(0.1, 0.2)))
self.model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
When initialising the network and fitting it with data, the process exits with a bad code: 0xC0000005
. This doesn't give much information on the problem itself, and the bad exit code is given just then, on exit. Even a print
statement is executed after the routines successfully. This has led me to suspect it is not a problem with the implementation, but something else.
I'm using Python 3.6.5 on PyCharm, but the problem has occured even when executing from a command line. As I said, multiple models are juggled around, but one training is enough to crash.
What could possibly be at fault here? I realise it's not such a reproducable problem, but any pointers towards even debugging would be greatly appreciated.
I tried modifying the fit
function according to this answer, but with no luck. Here's the modified version:
from keras import backend as K
import gc
def fit(self, x_train, y_train, n=20, batch=256):
K.set_session(self.session)
with self.graph.as_default():
with self.session.as_default():
self.model.fit(x_train, y_train, epochs=n*10, batch_size=batch, verbose=0)
K.clear_session()
gc.collect()
Next I tried to create a new session for each computation (tf.Session(graph=self.graph)
). It worked when using the cg.collect()
, but after training the model, I could not make predictions with a new session. tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value dense_1/bias
Currently (Nov 8th) I'm releasing all possible resources before creating and when loading a model. This has had the effect that I can create the model once, but the second time around (I do two training passes to evaluate the model independently) the program crashes like before. Let's try a new question, this is getting out of hand. Q v.2
User contributions licensed under CC BY-SA 3.0