Challenging scenario is addressed in which a hands-free speech recognizer operates in a noisy office environment with model adaptation functionalities. The use of a single far microphone as well as that of a microphone array input are investigated. beside the benefits due to the application of microphone array processing, system robustness is improved by training hidden Markov models with a contaminated version of a clean corpus. This artificial corpus is produced by exploiting information extracted from "real world" acoustic scenarios. The resulting models are then used as starting point for unsupervised incremental adaptation. Experimental results of connected digit recognition in a real noisy environment show the advantages provided by the joint use of microphone array processing, HMM training on contaminated speech, and incremental adaptation, as well as their respective contribution to the overall improvement of performance, which started from approximately 30% word recognition rate using the baseline system and achieved 99% using the best system configuration

Training of HMM with Contaminated Speech Material for Hands-Free Speech Recognition

Matassoni, Marco;Omologo, Maurizio;Giuliani, Diego;Svaizer, Piergiorgio
2000-01-01

Abstract

Challenging scenario is addressed in which a hands-free speech recognizer operates in a noisy office environment with model adaptation functionalities. The use of a single far microphone as well as that of a microphone array input are investigated. beside the benefits due to the application of microphone array processing, system robustness is improved by training hidden Markov models with a contaminated version of a clean corpus. This artificial corpus is produced by exploiting information extracted from "real world" acoustic scenarios. The resulting models are then used as starting point for unsupervised incremental adaptation. Experimental results of connected digit recognition in a real noisy environment show the advantages provided by the joint use of microphone array processing, HMM training on contaminated speech, and incremental adaptation, as well as their respective contribution to the overall improvement of performance, which started from approximately 30% word recognition rate using the baseline system and achieved 99% using the best system configuration
2000
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/136
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact