ML.NET Tensorflow Image clasification crashes with SEHException when run in Azure

0

I'm using ML.NET tensorflow in asp.net core C# app for image background removal (similarly to how it is implemented here https://github.com/susheelsk/image-background-removal, https://github.com/OPHoperHPO/image-background-remove-tool).
Tensorflow model used is DeepLabV3 xception_model http://download.tensorflow.org/models/deeplabv3_pascal_train_aug_2018_01_04.tar.gz
When run locally it works without errors (at least I was not able to reproduce it locally).
But when running app service in Azure, sometimes it starts to crash with SEHException on calling PredictionEnginePool Predict method:


System.Runtime.InteropServices.SEHException (0x80004005): External component has thrown an exception. at Tensorflow.c_api.TF_SessionRun(IntPtr session, TF_Buffer* run_options, TF_Output[] inputs, IntPtr[] input_values, Int32 ninputs, TF_Output[] outputs, IntPtr[] output_values, Int32 noutputs, IntPtr[] target_opers, Int32 ntargets, IntPtr run_metadata, IntPtr status) at Microsoft.ML.TensorFlow.TensorFlowUtils.Runner.Run() at Microsoft.ML.Transforms.TensorFlowTransformer.Mapper.UpdateCacheIfNeeded(Int64 position, ITensorValueGetter[] srcTensorGetters, String[] activeOutputColNames, OutputCache outputCache) at Microsoft.ML.Transforms.TensorFlowTransformer.Mapper.<>c__DisplayClass9_01.<MakeGetter>b__4(VBuffer1& dst) at Microsoft.ML.Data.TypedCursorable1.TypedRowBase.<>c__DisplayClass8_01.b__0(TRow row) at Microsoft.ML.Data.TypedCursorable1.TypedRowBase.FillValues(TRow row) at Microsoft.ML.Data.TypedCursorable1.RowImplementation.FillValues(TRow row) at Microsoft.ML.PredictionEngineBase2.FillValues(TDst prediction) at Microsoft.ML.PredictionEngine2.Predict(TSrc example, TDst& prediction) at Microsoft.ML.PredictionEngineBase2.Predict(TSrc example) at Microsoft.Extensions.ML.PredictionEnginePoolExtensions.Predict[TData,TPrediction](PredictionEnginePool2 predictionEnginePool, String modelName, TData example) at...


How can I investigate it more deeply to get more insights on this?
What can be the reason of such issue?

And one more note: issue disappears after restarting of service and its web jobs.

c#
tensorflow
azure-web-sites
ml.net
asked on Stack Overflow Jul 17, 2020 by Oksana • edited Jul 23, 2020 by Oksana

1 Answer

0

How can I investigate it more deeply to get more insights on this?

You might be able to get meaningful error code

SEHException is derived from ExternalException and has an ErrorCode property which is just a HRESULT defined in Winerror.h

Common error values are located here

https://docs.microsoft.com/en-us/windows/win32/seccrypto/common-hresult-values

answered on Stack Overflow Jul 17, 2020 by Michael Randall

User contributions licensed under CC BY-SA 3.0