Challenging scenario is addressed in which a hands-free speech recognizer operates in a noisy office environment with model adaptation functionalities. The use of a single far microphone as well as that of a microphone array input are investigated. beside the benefits due to the application of microphone array processing, system robustness is improved by training hidden Markov models with a contaminated version of a clean corpus. This artificial corpus is produced by exploiting information extracted from "real world" acoustic scenarios. The resulting models are then used as starting point for unsupervised incremental adaptation. Experimental results of connected digit recognition in a real noisy environment show the advantages provided by the joint use of microphone array processing, HMM training on contaminated speech, and incremental adaptation, as well as their respective contribution to the overall improvement of performance, which started from approximately 30% word recognition rate using the baseline system and achieved 99% using the best system configuration
Training of HMM with Contaminated Speech Material for Hands-Free Speech Recognition
Matassoni, Marco;Omologo, Maurizio;Giuliani, Diego;Svaizer, Piergiorgio
2000-01-01
Abstract
Challenging scenario is addressed in which a hands-free speech recognizer operates in a noisy office environment with model adaptation functionalities. The use of a single far microphone as well as that of a microphone array input are investigated. beside the benefits due to the application of microphone array processing, system robustness is improved by training hidden Markov models with a contaminated version of a clean corpus. This artificial corpus is produced by exploiting information extracted from "real world" acoustic scenarios. The resulting models are then used as starting point for unsupervised incremental adaptation. Experimental results of connected digit recognition in a real noisy environment show the advantages provided by the joint use of microphone array processing, HMM training on contaminated speech, and incremental adaptation, as well as their respective contribution to the overall improvement of performance, which started from approximately 30% word recognition rate using the baseline system and achieved 99% using the best system configurationI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.