In this paper a challenging scenario is addressed in which a hands-free speech recognizer operates in a noisy office environment with either batch or incremental model adaptation. The application of a microphone array processing compensates only for part of the mismatch between training and testing acoustic conditions. In a previous work it was shown that the acoustic mismatch can be further reduced by conditioning hidden Markov models to certain assumed operating acoustic conditions. Conditioned HMMs are obtained by training using a filtered version of the clean speech corpus. In this work, starting from that result, we investigate the use of conditioned models as initial models for both supervised batch adaptation and unsupervised incremental adaptation. Experimental results obtained for a hands-free connected digit recognition task show that models trained with filtered clean speech allow to obtain better recognition performance than using models trained with clean speech with both batch and incremental model adaptation

Robust HMM training and adaptation in hands-free speech recognition

Giuliani, Diego;Matassoni, Marco;Omologo, Maurizio;Svaizer, Piergiorgio
1999-01-01

Abstract

In this paper a challenging scenario is addressed in which a hands-free speech recognizer operates in a noisy office environment with either batch or incremental model adaptation. The application of a microphone array processing compensates only for part of the mismatch between training and testing acoustic conditions. In a previous work it was shown that the acoustic mismatch can be further reduced by conditioning hidden Markov models to certain assumed operating acoustic conditions. Conditioned HMMs are obtained by training using a filtered version of the clean speech corpus. In this work, starting from that result, we investigate the use of conditioned models as initial models for both supervised batch adaptation and unsupervised incremental adaptation. Experimental results obtained for a hands-free connected digit recognition task show that models trained with filtered clean speech allow to obtain better recognition performance than using models trained with clean speech with both batch and incremental model adaptation
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/1838
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact