This paper presents the development of a Named Entity (NE) recognition system for the Italian Broadcast news domain. A statistical model is introduced based on a trigram language model defined on words and NE classes. The estimaiton of the NE model is carried out with a very little list of 2,360 manually tagged NEs and a large untagged newspaper corpus. An iterative training procedure is applied which goes through the estimation of simpler models, whose parameters are used to initialize the complete NE model. In the end, NE recognition experiments are reported, on broadcast news transcripts generated by a speech recognition system
Bootstrapping Named Entity Recognition for Italian Broadcast News
Federico, Marcello;Bertoldi, Nicola;Sandrini, Vanessa
2002-01-01
Abstract
This paper presents the development of a Named Entity (NE) recognition system for the Italian Broadcast news domain. A statistical model is introduced based on a trigram language model defined on words and NE classes. The estimaiton of the NE model is carried out with a very little list of 2,360 manually tagged NEs and a large untagged newspaper corpus. An iterative training procedure is applied which goes through the estimation of simpler models, whose parameters are used to initialize the complete NE model. In the end, NE recognition experiments are reported, on broadcast news transcripts generated by a speech recognition systemFile in questo prodotto:
Non ci sono file associati a questo prodotto.
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.