A Combination of Classifiers for Named Entity Recognition on Transcription

Alam, F.; Zanoli, Roberto

Abstract. This paper presents a Named Entity Recognition (NER) system on broadcast news transcription where two different classifiers are set up in a loop so that the output of one of the classifiers is exploited by the other to refine its decision. The approach we followed is similar to that used in Typhoon, which is a NER system designed for newspaper articles; in that respect, one of the dis-tinguishing features of our approach is the use of Conditional Random Fields in place of Hidden Markov Models. To make the second classifier we extracted sentences from a large unlabelled corpus. Another relevant feature is instead strictly related to transcription annotations. Transcriptions lack orthographic and punctuation information and this typically results in poor performance. As a result, an additional module for case and punctuation restoration has been de-veloped. This paper describes the system and reports its performance which is evaluated by taking part in Evalita 2011 in the task of Named Entity Recogni-tion on Transcribed Broadcast News. In addition, the Evalita 2009 dataset, con-sisting of newspapers articles, is used to present a comparative analysis by ex-tracting named entities from newspapers and broadcast news.