This work presents an experimental analysis of distant-talking speech recognition in a variety of reverberant conditions, correlating ASR performance to a compact representation of the propagation channel (i.e., the room impulse response). It is well known that reverberation and background noise degrade speech recognition performance, but few studies have investigated the relation between room impulse responses and recognition rates in a comprehensive manner. In particular, we show how the ASR accuracy is related to features derived from the structure of the early arrivals and the reverberation tail. A representation based on the combination of few parameters is hence proposed, analysing the impact of reverberation on different speech recognition tasks. Possible applications of the derived measure are in data contamination for acoustic modeling where this feature can be employed either to select the most suitable model for a given acoustic condition or to define the subset of room impulse responses to be used for the creation of partially matched reverberant models. Recognition results using different back-end solutions (GMM, DNN) on data generated with the image method and with real impulse responses validate the effectiveness of the approach.

On the relationship between Early-to-Late Ratio of Room Impulse Responses and ASR performance in reverberant environments

Brutti, Alessio;Matassoni, Marco
2015

Abstract

This work presents an experimental analysis of distant-talking speech recognition in a variety of reverberant conditions, correlating ASR performance to a compact representation of the propagation channel (i.e., the room impulse response). It is well known that reverberation and background noise degrade speech recognition performance, but few studies have investigated the relation between room impulse responses and recognition rates in a comprehensive manner. In particular, we show how the ASR accuracy is related to features derived from the structure of the early arrivals and the reverberation tail. A representation based on the combination of few parameters is hence proposed, analysing the impact of reverberation on different speech recognition tasks. Possible applications of the derived measure are in data contamination for acoustic modeling where this feature can be employed either to select the most suitable model for a given acoustic condition or to define the subset of room impulse responses to be used for the creation of partially matched reverberant models. Recognition results using different back-end solutions (GMM, DNN) on data generated with the image method and with real impulse responses validate the effectiveness of the approach.
File in questo prodotto:
File Dimensione Formato  
paper_final.pdf

non disponibili

Tipologia: Documento in Pre-print
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 1.78 MB
Formato Adobe PDF
1.78 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/300607
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact