I am trying to use freeling to recognize and classify named entities in Spanish, I am testing with the analizer as I still don't understand how to use the Python API. So when using the analizer in a text it doesn't recognize or classify named entities or dates or anything of the style. this is what i do
analyze -f es.cfg --ner --nec --date < mytext > out
where mytext content is:
En ese contexto, el ministro de Salud Pública, José Angel Portal Miranda, reiteró que, con 2 205 casos confirmados y 83 pacientes fallecidos, Cuba continúa bajando la letalidad hasta un 3,76 %, lo cual nos mantiene en el lugar 18 entre los 35 países de las Américas que reportan casos positivos hasta ayer 10 de junio.
and output is:
En en SP 1
ese ese DD0MS0 0.966694
contexto contexto NCMS000 1
, , Fc 1
el el DA0MS0 1
ministro ministro NCMS000 1
de de SP 0.999961
Salud_Pública salud_pública NP00V00 1
, , Fc 1
José_Angel_Portal_Miranda josé_angel_portal_miranda NP00SP0 1
...
and I don't see anywhere the classes of the entities nor the indicator that it is a date
es.cfg content is:
##
#### default configuration file for Spanish analyzer
##
#### General options
Lang=es
Locale=default
### Tagset description file, used by different modules
TagsetFile=$FREELINGSHARE/es/tagset.dat
#### Trace options. Only effective if we have compiled with -DVERBOSE
#
## Possible values for TraceModule (may be OR'ed)
#define SPLIT_TRACE 0x00000001
#define TOKEN_TRACE 0x00000002
#define MACO_TRACE 0x00000004
#define OPTIONS_TRACE 0x00000008
#define NUMBERS_TRACE 0x00000010
#define DATES_TRACE 0x00000020
#define PUNCT_TRACE 0x00000040
#define DICT_TRACE 0x00000080
#define SUFF_TRACE 0x00000100
#define LOCUT_TRACE 0x00000200
#define NP_TRACE 0x00000400
#define PROB_TRACE 0x00000800
#define QUANT_TRACE 0x00001000
#define NEC_TRACE 0x00002000
#define AUTOMAT_TRACE 0x00004000
#define TAGGER_TRACE 0x00008000
#define HMM_TRACE 0x00010000
#define RELAX_TRACE 0x00020000
#define RELAX_TAGGER_TRACE 0x00040000
#define CONST_GRAMMAR_TRACE 0x00080000
#define SENSES_TRACE 0x00100000
#define CHART_TRACE 0x00200000
#define GRAMMAR_TRACE 0x00400000
#define DEP_TRACE 0x00800000
#define UTIL_TRACE 0x01000000
TraceLevel=0
TraceModule=0x0000
## Options to control the applied modules. The input may be partially
## processed, or not a full analysis may me wanted. The specific
## formats are a choice of the main program using the library, as well
## as the responsability of calling only the required modules.
## Valid input/output formats are: plain, token, splitted, morfo, tagged, parsed
InputLevel=text
OutputLevel=morfo
# consider each newline as a sentence end
AlwaysFlush=no
#### Tokenizer options
TokenizerFile=$FREELINGSHARE/es/tokenizer.dat
#### Splitter options
SplitterFile=$FREELINGSHARE/es/splitter.dat
#### Morfo options
AffixAnalysis=yes
CompoundAnalysis=yes
MultiwordsDetection=yes
NumbersDetection=yes
PunctuationDetection=yes
DatesDetection=yes
QuantitiesDetection=yes
DictionarySearch=yes
ProbabilityAssignment=yes
DecimalPoint=,
ThousandPoint=.
LocutionsFile=$FREELINGSHARE/es/locucions.dat
QuantitiesFile=$FREELINGSHARE/es/quantities.dat
AffixFile=$FREELINGSHARE/es/afixos.dat
CompoundFile=$FREELINGSHARE/es/compounds.dat
ProbabilityFile=$FREELINGSHARE/es/probabilitats.dat
DictionaryFile=$FREELINGSHARE/es/dicc.src
PunctuationFile=$FREELINGSHARE/common/punct.dat
ProbabilityThreshold=0.001
# NER options
NERecognition=yes
# config file for "crf" machine learning NERC
# (recognition and classification in a single step)
NPDataFile=$FREELINGSHARE/es/nerc/nerc/nerc.dat
# config file for "basic" rule based NER
#NPDataFile=$FREELINGSHARE/es/np.dat
# config file for "bio" machine learning NER
# NPDataFile=$FREELINGSHARE/es/nerc/ner/ner-ab-poor1.dat
# NPDataFile=$FREELINGSHARE/es/nerc/ner/ner-ab-rich.dat
# "rich" model is trained with rich gazetteer. Offers higher accuracy but
# requires adapting gazetteer files to have high coverage on target corpus.
# "poor1" model is trained with poor gazetteer. Accuracy is splightly lower
# but suffers small accuracy loss the gazetteer has low coverage in target
# corpus. If in doubt, use "poor1" model.
## Phonetic encoding of words.
Phonetics=no
PhoneticsFile=$FREELINGSHARE/es/phonetics.dat
## NEC options. See README in common/nec
NEClassification=yes
NECFile=$FREELINGSHARE/es/nerc/nec/nec-ab-poor1.dat
#NECFile=$FREELINGSHARE/es/nerc/nec/nec-ab-rich.dat
## Sense annotation options (none,all,mfs,ukb)
SenseAnnotation=none
SenseConfigFile=$FREELINGSHARE/es/senses.dat
UKBConfigFile=$FREELINGSHARE/es/ukb.dat
#### Tagger options
Tagger=hmm
TaggerHMMFile=$FREELINGSHARE/es/tagger.dat
TaggerRelaxFile=$FREELINGSHARE/es/constr_gram-B.dat
TaggerRelaxMaxIter=500
TaggerRelaxScaleFactor=670.0
TaggerRelaxEpsilon=0.001
TaggerRetokenize=yes
TaggerForceSelect=tagger
#### Parser options
GrammarFile=$FREELINGSHARE/es/chunker/grammar-chunk.dat
#### Dependence Parser options
DependencyParser=lstm
DepLSTMFile=$FREELINGSHARE/es/dep_lstm/params-es.dat
#DependencyParser=txala
DepTxalaFile=$FREELINGSHARE/es/dep_txala/dependences.dat
#DependencyParser=treeler
DepTreelerFile=$FREELINGSHARE/es/treeler/dependences.dat
# Semantic Role Labelling options
SRLTreelerFile=$FREELINGSHARE/es/treeler/srl.dat
#### Coreference Solver options
#CorefFile=$FREELINGSHARE/es/coref/relaxcor_constit/relaxcor.dat
CorefFile=$FREELINGSHARE/es/coref/relaxcor_dep/relaxcor.dat
SemGraphExtractorFile=$FREELINGSHARE/es/semgraph/semgraph-SRL.dat
please help :)
User contributions licensed under CC BY-SA 3.0