Stochastic n-gram language models have been successfully applied in continuous speech recognition for several years. Such language models provide many computational advantages but also require huge text corpora for parameter estimation. Moreover, the texts must exactly reflect, in a statistical sense, the user’s language. As a matter of fact, even inside a single application domain (e.g. medical reporting), people use language in different ways, and consequently with different statistical features. Estimating a language model on a sample that is not representative severely affects speech recognition performance. A solution to this problem is suggested by the techniques employed in acoustic modeling to adapt a speaker-independent model to a speaker-dependent one. In fact, a language model could first be estimated on a large user-independent corpus and incrementally adapted to the user language, during system’s usage. In this paper, the Bayesian and the maximum ‘a posteriori’ adaptation methods are presented and an interpolation model is derived. Moreover, an EM derived algorithm for estimating the latter model is described. Experimental comparisons have been carried out in terms of perplexity and recognition accuracy. The interpolation model outperforms the classical methods with only few thousands of training words and it is competitive with language model estimation when enough training data are available

Adaptive Estimation of N-gram Language Models

Federico, Marcello
1995-01-01

Abstract

Stochastic n-gram language models have been successfully applied in continuous speech recognition for several years. Such language models provide many computational advantages but also require huge text corpora for parameter estimation. Moreover, the texts must exactly reflect, in a statistical sense, the user’s language. As a matter of fact, even inside a single application domain (e.g. medical reporting), people use language in different ways, and consequently with different statistical features. Estimating a language model on a sample that is not representative severely affects speech recognition performance. A solution to this problem is suggested by the techniques employed in acoustic modeling to adapt a speaker-independent model to a speaker-dependent one. In fact, a language model could first be estimated on a large user-independent corpus and incrementally adapted to the user language, during system’s usage. In this paper, the Bayesian and the maximum ‘a posteriori’ adaptation methods are presented and an interpolation model is derived. Moreover, an EM derived algorithm for estimating the latter model is described. Experimental comparisons have been carried out in terms of perplexity and recognition accuracy. The interpolation model outperforms the classical methods with only few thousands of training words and it is competitive with language model estimation when enough training data are available
1995
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/1160
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact