This paper considers the problems of estimating bigram language models and of efficiently representing them by a finite state network, which can be employed by an hidden Markov model based, beam-search, continuous speech recognizer. A review of the best known bigram estimation techniques is given together with a description of the original Stacked model. Language model comparisons in terms of perplexity are given for three text corpora with different data sparseness conditions, while speech recognition accuracy tests are presented for a 10,000-word real-time, speaker independent dictation task. The Stacked estimation method compares favorably with the others, by achieving about 93% of word accuracy. If better language model estimates can improve recognition accuracy, representations better suited to the search algorithm can improve its speed as well. Two static representations of language models are introduced: linear and tree-based. Results show that the latter organization is better exploited by the beam-search algorithm as it provides a 5 times faster response with same word accuracy. Finally, an off-line reduction algorithm is presented that cuts the space requirements of the tree-based topology to about 40%. The proposed solutions presented here have been successfully employed in a real-time, speaker independent, 10,000-word real-time dictation system for radiological reporting.

Language Modelling for Efficient Beam-Search

Federico, Marcello;Cettolo, Mauro;Brugnara, Fabio;
1995-01-01

Abstract

This paper considers the problems of estimating bigram language models and of efficiently representing them by a finite state network, which can be employed by an hidden Markov model based, beam-search, continuous speech recognizer. A review of the best known bigram estimation techniques is given together with a description of the original Stacked model. Language model comparisons in terms of perplexity are given for three text corpora with different data sparseness conditions, while speech recognition accuracy tests are presented for a 10,000-word real-time, speaker independent dictation task. The Stacked estimation method compares favorably with the others, by achieving about 93% of word accuracy. If better language model estimates can improve recognition accuracy, representations better suited to the search algorithm can improve its speed as well. Two static representations of language models are introduced: linear and tree-based. Results show that the latter organization is better exploited by the beam-search algorithm as it provides a 5 times faster response with same word accuracy. Finally, an off-line reduction algorithm is presented that cuts the space requirements of the tree-based topology to about 40%. The proposed solutions presented here have been successfully employed in a real-time, speaker independent, 10,000-word real-time dictation system for radiological reporting.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/3404
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact