This paper investigates the problem of dynamically updating the language model (LM) of a broadcast news speech recognition system, in order to cope with language and topic changes, typical of the news domain. Statistical adaptation methods are proposed that exploit written news sources which are daily available on the Internet, i.e. newswires and newspapers. Specifically, LM adaptation is performed by extending the basic lexicon, in order to minimize the out-of-vocabulary (OOV) rate, and by adapting the word probability distribution on the contemporary data. Experiments performed on 19 newscasts showed relative reductions of 58% on the OOV rate, 16% on the perplexity, and 4% on the word error rate
Broadcast News LM Adaptation using Contemporary Texts
Federico, Marcello;Bertoldi, Nicola
2001-01-01
Abstract
This paper investigates the problem of dynamically updating the language model (LM) of a broadcast news speech recognition system, in order to cope with language and topic changes, typical of the news domain. Statistical adaptation methods are proposed that exploit written news sources which are daily available on the Internet, i.e. newswires and newspapers. Specifically, LM adaptation is performed by extending the basic lexicon, in order to minimize the out-of-vocabulary (OOV) rate, and by adapting the word probability distribution on the contemporary data. Experiments performed on 19 newscasts showed relative reductions of 58% on the OOV rate, 16% on the perplexity, and 4% on the word error rateI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.