In this paper, we present algorithms for dealing with variability and mismatch in speech recognition due to environmental conditions and non-native speaker populations. The proposed algorithms cover a broad spectrum of ideas including robust feature extraction, feature compensation and speech enhancement. Specifically the following algorithms are presented and evaluated: beamforming for multi-microphone speech recognition, robust modulation and fractal features, Teager energy cepstrum coefficients, parametric feature equalization, speech enhancement, and acoustic modeling for non-native speech recognition. Also the problem of feature fusion and voice activity detection are discussed. Evaluation results on the AURORA databasesunder the auspices of the HIWIRE project show that significant gains can be achieved under adverse or mismatched conditions using these algorithms. Relative error rate reduction of up to 50% was shown for multi-microphone speech recognition, robust feature combination and speech enhancement. 30-40% reduction was shown for parametric feature equalization and non-native acoustic models.

Towards Speaker and Environmental Robustness in ASR: the HIWIRE project

Matassoni, Marco;Svaizer, Piergiorgio
2006-01-01

Abstract

In this paper, we present algorithms for dealing with variability and mismatch in speech recognition due to environmental conditions and non-native speaker populations. The proposed algorithms cover a broad spectrum of ideas including robust feature extraction, feature compensation and speech enhancement. Specifically the following algorithms are presented and evaluated: beamforming for multi-microphone speech recognition, robust modulation and fractal features, Teager energy cepstrum coefficients, parametric feature equalization, speech enhancement, and acoustic modeling for non-native speech recognition. Also the problem of feature fusion and voice activity detection are discussed. Evaluation results on the AURORA databasesunder the auspices of the HIWIRE project show that significant gains can be achieved under adverse or mismatched conditions using these algorithms. Relative error rate reduction of up to 50% was shown for multi-microphone speech recognition, robust feature combination and speech enhancement. 30-40% reduction was shown for parametric feature equalization and non-native acoustic models.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/3473
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact