Hands-free speech recognition using a filtered clean corpus and incremental HMM adaptation

Matassoni, Marco; Omologo, Maurizio; Giuliani, Diego

A challenging scenario is addressed in which a hands-free speech recognizer operates in a noisy office environment with incremental model adaptation functionalities. The use of a single far microphone as well as that of a microphone array input are investigated. In a previous work it was shown that the acoustic mis-match, remaining after the application of microphone array processing, can be further reduced by conditioning Hidden Markov Models to operating acoustic conditions. Conditioned HMMs are models trained using the `filtered` version of a clean corpus, which is speech material better representing noisy real environments. Afterwards, conditioned models are used as initial models for unsupervised incremental adaptation. Experimental results of connected digit recognition show that the models trained with filtered clean speech allows to obtain better recognition performance than models trained with clean speech. Furthermore, results show a significant performance increase when incremental adaptation is applied, even after recognition of few utterances