This paper investigates the problem of updating over time the statistical language model (LM) of an Italian broadcast news transcription system. Statistical adaptation methods are proposed which try to cope with the complex dynamics of news by exploiting newswire texts daily available on the Internet. In particular, contemporary news reports are used to extend the lexicon of the LM, to minimize the out-of-vocabulary (OOV) word rate, and to adapt the n-gram probabilities. Experiments performed on 19 news shows, spanning a period of one month, showed relative reductions of 58% in OOV word rate, 16% in perplexity, and 4% in word error rate (WER)

Broadcast news LM adaptation over time

Federico, Marcello;Bertoldi, Nicola
2004-01-01

Abstract

This paper investigates the problem of updating over time the statistical language model (LM) of an Italian broadcast news transcription system. Statistical adaptation methods are proposed which try to cope with the complex dynamics of news by exploiting newswire texts daily available on the Internet. In particular, contemporary news reports are used to extend the lexicon of the LM, to minimize the out-of-vocabulary (OOV) word rate, and to adapt the n-gram probabilities. Experiments performed on 19 news shows, spanning a period of one month, showed relative reductions of 58% in OOV word rate, 16% in perplexity, and 4% in word error rate (WER)
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/2288
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact