A challenging scenario is addressed in which a distant-talking speech recognizer operates in a noisy office environment with model adaptation. The use of a single far microphone as well as that of a microphone array input are investigated. In addition to the benefits from the application of microphone array processing, system robustness is improved by training hidden Markov models with a contaminated version of a clean corpus. This artificial corpus is produced by exploiting information extracted from `real world` acousti scenarios. The resulting models are then used a a starting point for unsupervised incremental adaptation. Experimental results show that improvements in recognition accuracy due to multiple microphones, HMM training on contaminated speech, and incremental adaptation are additive on a connected digits task. Moreover, the results show that unsupervised incremental adaptation receives the benefits of starting from models trained using contaminated speech. A final contribution of the paper refers to the influence of accuracy of speech activity detection, which seems to be relevant when moving towards real applications

HMM Training with Contaminated Speech Material for Distant-Talking Speech Recognition

Matassoni, Marco;Omologo, Maurizio;Giuliani, Diego;Svaizer, Piergiorgio
2002-01-01

Abstract

A challenging scenario is addressed in which a distant-talking speech recognizer operates in a noisy office environment with model adaptation. The use of a single far microphone as well as that of a microphone array input are investigated. In addition to the benefits from the application of microphone array processing, system robustness is improved by training hidden Markov models with a contaminated version of a clean corpus. This artificial corpus is produced by exploiting information extracted from `real world` acousti scenarios. The resulting models are then used a a starting point for unsupervised incremental adaptation. Experimental results show that improvements in recognition accuracy due to multiple microphones, HMM training on contaminated speech, and incremental adaptation are additive on a connected digits task. Moreover, the results show that unsupervised incremental adaptation receives the benefits of starting from models trained using contaminated speech. A final contribution of the paper refers to the influence of accuracy of speech activity detection, which seems to be relevant when moving towards real applications
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/204
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact