I'm trying to train a new model in SpaCy with custom entities and I am having issues when running it.
I only have one pipe (ner) and I am adding all my entity types as label to it.
I figured out that adding a lot of distinct labels (~219 labels) to the ner pipe makes it crash on the first nlp.update
(Process finished with exit code -1073740791 (0xC0000409)
)
I'm running Spacy version: 2.0.12 on a 16gb RAM laptop on Windows 10 with Python 3.7. Any idea why it crashes on the first nlp.update execution the more labels I add and how can I prevent that ? I tried with only ~100 labels and it works fine.
Here's my code:
def __train_model(self, spacy_model, entity_types):
nlp = spacy.blank("en")
ner = nlp.create_pipe("ner")
nlp.add_pipe(ner)
for entity_type in list(entity_types):
ner.add_label(entity_type)
optimizer = nlp.begin_training()
# Start training
for i in range(20):
losses = {}
index = 0
random.shuffle(spacy_model)
for statement, entities in spacy_model:
nlp.update([statement], [entities], sgd=optimizer, losses=losses, drop=0.5)
return nlp
spacy_model:
[
('Simply put I see no other conclusion than Comcast has actively blocked our Smart TVs from accessing Netflix on purpose.', {'entities': [(42, 49, 'ORGANIZATION:SERVICE_PROVIDER:COMMUNICATIONS'), (75, 80, 'DEVICE:COMMUNICATIONS:TV:FEATURE'), (100, 107, 'ORGANIZATION:SERVICE_PROVIDER:COMMUNICATIONS')]})
...
]
EDIT: I tried on a Ubuntu 18.04 VM with 24Gb RAM and 2 cores and encountered the following error:
*** stack smashing detected ***: <unknown> terminated
Aborted (core dumped)
EDIT2: Fixed here: https://github.com/explosion/spaCy/issues/2800#issuecomment-425057478
User contributions licensed under CC BY-SA 3.0