This paper addresses the problem of voice activity detection for distant-talking speech recognition in noisy and reverberant environment. The proposed algorithm is based on the same Cross-power Spectrum Phase analysis that is used for talker location and tracking purposes. A normalized feature is derived, which is shown to be more effective than an energy-based one. The algorithm exploits that feature by dynamically updating the threshold as a non-linear average value computed during the preceding pause. Given a real multichannel database, recorded with the speaker at 2.5 meter distance from the microphones, experiments show that the proposed algorithm provides a relevant relative error rate reduction.
Scheda prodotto non validato
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte di FBK.
Titolo: | Use of a CSP-based voice activity detector for distant-talking ASR |
Autori: | |
Data di pubblicazione: | 2003 |
Abstract: | This paper addresses the problem of voice activity detection for distant-talking speech recognition in noisy and reverberant environment. The proposed algorithm is based on the same Cross-power Spectrum Phase analysis that is used for talker location and tracking purposes. A normalized feature is derived, which is shown to be more effective than an energy-based one. The algorithm exploits that feature by dynamically updating the threshold as a non-linear average value computed during the preceding pause. Given a real multichannel database, recorded with the speaker at 2.5 meter distance from the microphones, experiments show that the proposed algorithm provides a relevant relative error rate reduction. |
Handle: | http://hdl.handle.net/11582/934 |
Appare nelle tipologie: | 4.1 Contributo in Atti di convegno |