We present Typhoon, a classifier combination system for Named Entity Recognition (NER), in which two different classifiers are combined to exploit Data Redundancy and Patterns extracted from a large text corpus. Data Redundancy is attained when the same entity occurs in different places in documents, whereas Patterns are 2-grams, 3-grams, 4-grams and 5-grams preceding, and following entities in documents. The system consists of two classifiers in cascade, but it is possible to use a single classifier making the system faster (100 times faster, with a speed rate of about 20,000 tokens/sec); whereas the second classifier in the cascade can be used when more accuracy is needed. Moreover the system can use additional features such as that given by using a Text Classifier able to recognize the category to which the story belongs. The system performed the best on the task of Italian NER at EVALITA 2009, with an F1 of 0.82.

Named Entity Recognition through Redundancy Driven Classifiers

Zanoli, Roberto;Pianta, Emanuele;Giuliano, Claudio
2009-01-01

Abstract

We present Typhoon, a classifier combination system for Named Entity Recognition (NER), in which two different classifiers are combined to exploit Data Redundancy and Patterns extracted from a large text corpus. Data Redundancy is attained when the same entity occurs in different places in documents, whereas Patterns are 2-grams, 3-grams, 4-grams and 5-grams preceding, and following entities in documents. The system consists of two classifiers in cascade, but it is possible to use a single classifier making the system faster (100 times faster, with a speed rate of about 20,000 tokens/sec); whereas the second classifier in the cascade can be used when more accuracy is needed. Moreover the system can use additional features such as that given by using a Text Classifier able to recognize the category to which the story belongs. The system performed the best on the task of Italian NER at EVALITA 2009, with an F1 of 0.82.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/5369
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact