Stochastic n-gram language models have been successfully applied in continuous speech recognition for several years. Such language models provide many computational advantages but also require huge text corpora for parameter estimation. Moreover, the texts must exactly reflect, in a statistical sense, the user’s language. As a matter of fact, even inside a single application domain (e.g. medical reporting), people use language in different ways, and consequently with different statistical features. Estimating a language model on a sample that is not representative severely affects speech recognition performance. A solution to this problem is suggested by the techniques employed in acoustic modeling to adapt a speaker-independent model to a speaker-dependent one. In fact, a language model could first be estimated on a large user-independent corpus and incrementally adapted to the user language, during system’s usage. In this paper, the Bayesian and the maximum ‘a posteriori’ adaptation methods are presented and an interpolation model is derived. Moreover, an EM derived algorithm for estimating the latter model is described. Experimental comparisons have been carried out in terms of perplexity and recognition accuracy. The interpolation model outperforms the classical methods with only few thousands of training words and it is competitive with language model estimation when enough training data are available
Adaptive Estimation of N-gram Language Models
Federico, Marcello
1995-01-01
Abstract
Stochastic n-gram language models have been successfully applied in continuous speech recognition for several years. Such language models provide many computational advantages but also require huge text corpora for parameter estimation. Moreover, the texts must exactly reflect, in a statistical sense, the user’s language. As a matter of fact, even inside a single application domain (e.g. medical reporting), people use language in different ways, and consequently with different statistical features. Estimating a language model on a sample that is not representative severely affects speech recognition performance. A solution to this problem is suggested by the techniques employed in acoustic modeling to adapt a speaker-independent model to a speaker-dependent one. In fact, a language model could first be estimated on a large user-independent corpus and incrementally adapted to the user language, during system’s usage. In this paper, the Bayesian and the maximum ‘a posteriori’ adaptation methods are presented and an interpolation model is derived. Moreover, an EM derived algorithm for estimating the latter model is described. Experimental comparisons have been carried out in terms of perplexity and recognition accuracy. The interpolation model outperforms the classical methods with only few thousands of training words and it is competitive with language model estimation when enough training data are availableI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.