A well-known unfavorable property of HMMs in speech recognition is their inappropriate representation of phone and word durations. This paper describes an approach to resolve this limitation by integrating explicit word duration models into an HMM-based speech recognizer. Word durations are represented by log-normal densities using a back-off strategy that approximates durations of words that have been observed seldom by a combination of the statistics of suitable sub-word units. Furthermore, two different normalization procedures are compared which reduce the influence of the implicit HMM duration distribution resulting from the state-to-state transition probabilities. Experiments on European parliamentary speeches in English and Spanish language show that the proposed approaches are effective and lead to small, but consistent reductions in the word error rate for large-vocabulary speech recognition tasks.

Word Duration Modeling for Word Graph Rescoring in LVCSR

Seppi, Dino;Falavigna, Giuseppe Daniele;Stemmer, Georg;Gretter, Roberto
2007

Abstract

A well-known unfavorable property of HMMs in speech recognition is their inappropriate representation of phone and word durations. This paper describes an approach to resolve this limitation by integrating explicit word duration models into an HMM-based speech recognizer. Word durations are represented by log-normal densities using a back-off strategy that approximates durations of words that have been observed seldom by a combination of the statistics of suitable sub-word units. Furthermore, two different normalization procedures are compared which reduce the influence of the implicit HMM duration distribution resulting from the state-to-state transition probabilities. Experiments on European parliamentary speeches in English and Spanish language show that the proposed approaches are effective and lead to small, but consistent reductions in the word error rate for large-vocabulary speech recognition tasks.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/3352
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact