In this paper, an auditory based modulation spectral feature is presented to improve automatic speech recognition performance in presence of room reverberation. The solution is based on extracting features from auditory processing characteristics, specifically gammatone filtering based long-term modulation spectral features to reduce sensitivity to environmental noise and further preserve the important speech intelligibility information in the speech signal essential for ASR. Experiments are performed on Aurora-5 meeting recorder digit task recorded with four different microphones in hands-free mode at a real meeting room. For comparison purposes the recognition results obtained using standard ETSI basic and advanced front-ends and conventional features with standard feature compensation are tested. The experimental results reveal that the proposed features provide reliable and considerable improvements with respect to the state-of-the-art feature extraction techniques.

An Auditory Based Modulation Spectral Feature for Reverberant Speech Recognition

Maganti, Hari Krishna;Matassoni, Marco
2010-01-01

Abstract

In this paper, an auditory based modulation spectral feature is presented to improve automatic speech recognition performance in presence of room reverberation. The solution is based on extracting features from auditory processing characteristics, specifically gammatone filtering based long-term modulation spectral features to reduce sensitivity to environmental noise and further preserve the important speech intelligibility information in the speech signal essential for ASR. Experiments are performed on Aurora-5 meeting recorder digit task recorded with four different microphones in hands-free mode at a real meeting room. For comparison purposes the recognition results obtained using standard ETSI basic and advanced front-ends and conventional features with standard feature compensation are tested. The experimental results reveal that the proposed features provide reliable and considerable improvements with respect to the state-of-the-art feature extraction techniques.
File in questo prodotto:
File Dimensione Formato  
INTERSPEECH2010.pdf

non disponibili

Tipologia: Documento in Post-print
Licenza: DRM non definito
Dimensione 264.66 kB
Formato Adobe PDF
264.66 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/20669
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact