This work discusses on the use of microphone arrays for hands-free continuous speech recognition in noisy and reverberant environment, with specific reference to a project that is under way at IRST laboratories. The final target is a system able to provide accurate talker location as well as real-time dictation, with the capability of self-adapting to the new speaker and to new noisy conditions. Two main aspects are focused that are microphone array processing and robust speech recognition. The array consists in four omnidirectional microphones, linearly placed at 1.5 m distance in front of the talker. Given the array signals, a Crosspower Spectrum Phase (CSP) based Time Delay Estimation technique allows to obtain talker location. Then, a Time Delay Compensation (TDC) module provides a beamformed signal, that is shown effective as input to a Hidden Markov Model (HMM) based recognizer. Given a small amount of sentences collected from a new speaker in the real environment, HMM adaptation further improves recognition rate. There results are confirmed both by experiments conducted in a noisy office environment and by simulation

Hands-free Speech Recognition using a Microphone Array

Omologo, Maurizio
1996-01-01

Abstract

This work discusses on the use of microphone arrays for hands-free continuous speech recognition in noisy and reverberant environment, with specific reference to a project that is under way at IRST laboratories. The final target is a system able to provide accurate talker location as well as real-time dictation, with the capability of self-adapting to the new speaker and to new noisy conditions. Two main aspects are focused that are microphone array processing and robust speech recognition. The array consists in four omnidirectional microphones, linearly placed at 1.5 m distance in front of the talker. Given the array signals, a Crosspower Spectrum Phase (CSP) based Time Delay Estimation technique allows to obtain talker location. Then, a Time Delay Compensation (TDC) module provides a beamformed signal, that is shown effective as input to a Hidden Markov Model (HMM) based recognizer. Given a small amount of sentences collected from a new speaker in the real environment, HMM adaptation further improves recognition rate. There results are confirmed both by experiments conducted in a noisy office environment and by simulation
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/231
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact