EXCEPTION_ACCESS_VIOLATION (0xc0000005) during training in DL4J

0

I am working on a dl4j example for cat and dog classification using this kaggle dataset: https://www.kaggle.com/c/dogs-vs-cats/data Currently, I only train my model without testing it. After the testing starts and runs for some time it throws this error:

A fatal error has been detected by the Java Runtime Environment:

EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ffd2ca1a799, pid=12580, tid=0x000000000000071c

JRE version: Java(TM) SE Runtime Environment (8.0_251-b08) (build 1.8.0_251-b08) Java VM: Java HotSpot(TM) 64-Bit Server VM (25.251-b08 mixed mode windows-amd64 compressed oops) Problematic frame: C [KERNELBASE.dll+0x3a799]

Failed to write core dump. Minidumps are not enabled by default on client versions of Windows

An error report file with more information is saved as: C:\Users\username\Documents\AIProject\AiProjectJava\hs_err_pid12580.log

If you would like to submit a bug report, please visit: http://bugreport.java.com/bugreport/crash.jsp The crash happened outside the Java Virtual Machine in native code. See problematic frame for where to report the bug.

hs_err_pid12580.log This is the log file but I cannot figure out what seems to be the problem. I thought that it had something to do with the memory allocation, so I "gave" the Java GC about 20GB of RAM and implemented a workspace but it did not really help; the code started to go through more iterations but still continued to throws this error but this time at random iterations. My most current suspicion is that the lines 77-88 from the log file (quote below) are the most relevant but there is almost zero information about opencv in deeplearning4j, so I do not know what to do next.

> Stack: [0x0000000057fb0000,0x00000000580b0000],  sp=0x00000000580abd90,  free space=1007k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
> C  [KERNELBASE.dll+0x3a799]
> C  [vcruntime140.dll+0x3351]
> C  [ntdll.dll+0xa0616]
> C  [opencv_core430.dll+0x1a2728]
> C  [opencv_imgproc430.dll+0x324394]
> C  [opencv_imgproc430.dll+0x719b2]
> C  [opencv_imgproc430.dll+0x721e7]
> C  [opencv_imgproc430.dll+0x4306a]
> C  [jniopencv_imgproc.dll+0x8914f]
> C  0x0000000002d08c67
Version Information
  • Deeplearning4j version: 1.0.0-beta7
  • Platform information: Windows 10 with AVX2 (the code runs on CPU)

If the pom.xml is needed I can post it in a comment and at the risk of making the issue too long and hoping that it will help to solve the issue, I will insert my code below:

public static void main(String[] args) throws Exception {
        int seed = 12345;
        int batchSize = 100;
        int height = 100;
        int width = 100;
        int channels = 3;
        Random randNumGen = new Random(seed);
        int outputNum = 2;
        int numEpochs = 2;

        Nd4j.getMemoryManager().togglePeriodicGc(false);

        File trainData = new File("C:\\Users\\username\\Documents\\Projects\\dataset\\PetImages\\Train");
        File testData = new File("C:\\Users\\username\\Documents\\Projects\\dataset\\PetImages\\Test");

        WorkspaceConfiguration mmap = WorkspaceConfiguration.builder()
                .initialSize(0)
                .policyLocation(LocationPolicy.MMAP)
                .build();

        try (MemoryWorkspace ws =
                     Nd4j.getWorkspaceManager().getAndActivateWorkspace(String.valueOf(mmap))) {
            FileSplit train = new FileSplit(trainData, NativeImageLoader.ALLOWED_FORMATS, randNumGen);
            FileSplit test = new FileSplit(testData, NativeImageLoader.ALLOWED_FORMATS, randNumGen);

            ParentPathLabelGenerator labelMaker = new ParentPathLabelGenerator();

            ImageRecordReader recordReader = new ImageRecordReader(height, width, channels, labelMaker);
            recordReader.initialize(train);

            DataSetIterator trainIter = new RecordReaderDataSetIterator(recordReader, batchSize, 1, outputNum);

            DataNormalization scaler = new ImagePreProcessingScaler(0, 1);
            scaler.fit(trainIter);
            trainIter.setPreProcessor(scaler);

            MemoryWorkspace workspace = null ;

            log.info("####### Model Build #######");

            MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
                    .seed(seed)
                    .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
                    .l2(1e-4)
                    .list()
                    .layer(0, new DenseLayer.Builder()
                            .nIn(height*width*channels)
                            .nOut(25)
                            .activation(Activation.RELU)
                            .weightInit(WeightInit.XAVIER)
                            .build())
                    .layer(1, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                            .nIn(25)
                            .nOut(outputNum)
                            .activation(Activation.SOFTMAX)
                            .weightInit(WeightInit.XAVIER)
                            .build())
                    .backpropType(BackpropType.Standard)
                    .setInputType(InputType.convolutional(height,width,channels))
                    .build();

            MultiLayerNetwork model = new MultiLayerNetwork(conf);
            model.init();

            model.setListeners(new ScoreIterationListener(1));
            log.info("####### Model Train #######");

        for(int i = 0; i < numEpochs; i++){
            System.out.println(i);
            model.fit(trainIter);
            System.out.println(i);
        }

        } catch(Exception e){
            System.out.println("doesn't work");
        }
}

java
deeplearning4j
dl4j
asked on Stack Overflow Jul 3, 2020 by art-m • edited Jul 3, 2020 by art-m

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0