Microphone arrays can be advantageously employed in Automatic Speech Recognition (ASR) systems to allow distant-talking interaction. Their beamforming capabilities are used to enhance the speech message, while attenuating the undesired contribution of environmental noise and reverberation. In the first part of the chapter the state of the art of ASR systems is briefly reviewed, with a particular concern about robustness in distant-talking applications. The objective is the reduction of the mismatch between real noisy data and the acoustic models used by the recognizer. Beamforming, speech enhancement, feature compensation and model adaptation are the techniques adopted to this end. The second part of the chapter is dedicated to the description of a microphone-array based speech recognition system developed at ITC-IRST. It includes a linear array beamformer, an acoustic front-end for speech activity detection and feature extraction, a recognition engine based on Hidden Markov Models and the modules for training and adaptation of the acoustic models. Finally the performance of this system on a typical recognition task is reported
Speech Recognition with Microphone Arrays
Omologo, Maurizio;Matassoni, Marco;Svaizer, Piergiorgio
2001-01-01
Abstract
Microphone arrays can be advantageously employed in Automatic Speech Recognition (ASR) systems to allow distant-talking interaction. Their beamforming capabilities are used to enhance the speech message, while attenuating the undesired contribution of environmental noise and reverberation. In the first part of the chapter the state of the art of ASR systems is briefly reviewed, with a particular concern about robustness in distant-talking applications. The objective is the reduction of the mismatch between real noisy data and the acoustic models used by the recognizer. Beamforming, speech enhancement, feature compensation and model adaptation are the techniques adopted to this end. The second part of the chapter is dedicated to the description of a microphone-array based speech recognition system developed at ITC-IRST. It includes a linear array beamformer, an acoustic front-end for speech activity detection and feature extraction, a recognition engine based on Hidden Markov Models and the modules for training and adaptation of the acoustic models. Finally the performance of this system on a typical recognition task is reportedI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.