This paper presents the development of a Named Entity (NE) recognition system for the Italian Broadcast news domain. A statistical model is introduced based on a trigram language model defined on words and NE classes. The estimaiton of the NE model is carried out with a very little list of 2,360 manually tagged NEs and a large untagged newspaper corpus. An iterative training procedure is applied which goes through the estimation of simpler models, whose parameters are used to initialize the complete NE model. In the end, NE recognition experiments are reported, on broadcast news transcripts generated by a speech recognition system

Bootstrapping Named Entity Recognition for Italian Broadcast News

Federico, Marcello;Bertoldi, Nicola;Sandrini, Vanessa
2002-01-01

Abstract

This paper presents the development of a Named Entity (NE) recognition system for the Italian Broadcast news domain. A statistical model is introduced based on a trigram language model defined on words and NE classes. The estimaiton of the NE model is carried out with a very little list of 2,360 manually tagged NEs and a large untagged newspaper corpus. An iterative training procedure is applied which goes through the estimation of simpler models, whose parameters are used to initialize the complete NE model. In the end, NE recognition experiments are reported, on broadcast news transcripts generated by a speech recognition system
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/627
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact