Training custom entities in SpaCy crashes with lots of ner labels *Fixed*

0

I'm trying to train a new model in SpaCy with custom entities and I am having issues when running it.

I only have one pipe (ner) and I am adding all my entity types as label to it.

I figured out that adding a lot of distinct labels (~219 labels) to the ner pipe makes it crash on the first nlp.update (Process finished with exit code -1073740791 (0xC0000409))

I'm running Spacy version: 2.0.12 on a 16gb RAM laptop on Windows 10 with Python 3.7. Any idea why it crashes on the first nlp.update execution the more labels I add and how can I prevent that ? I tried with only ~100 labels and it works fine.

Here's my code:

def __train_model(self, spacy_model, entity_types):
    nlp = spacy.blank("en")

    ner = nlp.create_pipe("ner")
    nlp.add_pipe(ner)

    for entity_type in list(entity_types):
        ner.add_label(entity_type)

    optimizer = nlp.begin_training()

    # Start training
    for i in range(20):
        losses = {}
        index = 0
        random.shuffle(spacy_model)

        for statement, entities in spacy_model:
            nlp.update([statement], [entities], sgd=optimizer, losses=losses, drop=0.5)

    return nlp

spacy_model:

[
    ('Simply put I see no other conclusion than Comcast has actively blocked our Smart TVs from accessing Netflix on purpose.', {'entities': [(42, 49, 'ORGANIZATION:SERVICE_PROVIDER:COMMUNICATIONS'), (75, 80, 'DEVICE:COMMUNICATIONS:TV:FEATURE'), (100, 107, 'ORGANIZATION:SERVICE_PROVIDER:COMMUNICATIONS')]})
    ...
]


EDIT: I tried on a Ubuntu 18.04 VM with 24Gb RAM and 2 cores and encountered the following error:

*** stack smashing detected ***: <unknown> terminated
Aborted (core dumped)


EDIT2: Fixed here: https://github.com/explosion/spaCy/issues/2800#issuecomment-425057478

spacy
asked on Stack Overflow Sep 19, 2018 by jsthivierge • edited Oct 26, 2018 by jsthivierge

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0