In this paper a challenging scenario is addressed in which a hands-free speech recognizer operates in a noisy office environment with either batch or incremental model adaptation. The application of a microphone array processing compensates only for part of the mismatch between training and testing acoustic conditions. In a previous work it was shown that the acoustic mismatch can be further reduced by conditioning hidden Markov models to certain assumed operating acoustic conditions. Conditioned HMMs are obtained by training using a filtered version of the clean speech corpus. In this work, starting from that result, we investigate the use of conditioned models as initial models for both supervised batch adaptation and unsupervised incremental adaptation. Experimental results obtained for a hands-free connected digit recognition task show that models trained with filtered clean speech allow to obtain better recognition performance than using models trained with clean speech with both batch and incremental model adaptation
Robust HMM training and adaptation in hands-free speech recognition
Giuliani, Diego;Matassoni, Marco;Omologo, Maurizio;Svaizer, Piergiorgio
1999-01-01
Abstract
In this paper a challenging scenario is addressed in which a hands-free speech recognizer operates in a noisy office environment with either batch or incremental model adaptation. The application of a microphone array processing compensates only for part of the mismatch between training and testing acoustic conditions. In a previous work it was shown that the acoustic mismatch can be further reduced by conditioning hidden Markov models to certain assumed operating acoustic conditions. Conditioned HMMs are obtained by training using a filtered version of the clean speech corpus. In this work, starting from that result, we investigate the use of conditioned models as initial models for both supervised batch adaptation and unsupervised incremental adaptation. Experimental results obtained for a hands-free connected digit recognition task show that models trained with filtered clean speech allow to obtain better recognition performance than using models trained with clean speech with both batch and incremental model adaptationI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.