Abstract. This paper presents a Named Entity Recognition (NER) system on broadcast news transcription where two different classifiers are set up in a loop so that the output of one of the classifiers is exploited by the other to refine its decision. The approach we followed is similar to that used in Typhoon, which is a NER system designed for newspaper articles; in that respect, one of the dis-tinguishing features of our approach is the use of Conditional Random Fields in place of Hidden Markov Models. To make the second classifier we extracted sentences from a large unlabelled corpus. Another relevant feature is instead strictly related to transcription annotations. Transcriptions lack orthographic and punctuation information and this typically results in poor performance. As a result, an additional module for case and punctuation restoration has been de-veloped. This paper describes the system and reports its performance which is evaluated by taking part in Evalita 2011 in the task of Named Entity Recogni-tion on Transcribed Broadcast News. In addition, the Evalita 2009 dataset, con-sisting of newspapers articles, is used to present a comparative analysis by ex-tracting named entities from newspapers and broadcast news.

A Combination of Classifiers for Named Entity Recognition on Transcription

Zanoli, Roberto
2013-01-01

Abstract

Abstract. This paper presents a Named Entity Recognition (NER) system on broadcast news transcription where two different classifiers are set up in a loop so that the output of one of the classifiers is exploited by the other to refine its decision. The approach we followed is similar to that used in Typhoon, which is a NER system designed for newspaper articles; in that respect, one of the dis-tinguishing features of our approach is the use of Conditional Random Fields in place of Hidden Markov Models. To make the second classifier we extracted sentences from a large unlabelled corpus. Another relevant feature is instead strictly related to transcription annotations. Transcriptions lack orthographic and punctuation information and this typically results in poor performance. As a result, an additional module for case and punctuation restoration has been de-veloped. This paper describes the system and reports its performance which is evaluated by taking part in Evalita 2011 in the task of Named Entity Recogni-tion on Transcribed Broadcast News. In addition, the Evalita 2009 dataset, con-sisting of newspapers articles, is used to present a comparative analysis by ex-tracting named entities from newspapers and broadcast news.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/105803
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact