In this paper we propose an effective, robust and computationally low-cost HMM-based start-endpoint detector for speech recognisers\footnote [this work has been partially finded by the European project HOMEY: IST-2001-32434]. Our first attempts follow the classical scheme feature estractor-Viterbi classifier (used for voice activity detection), followed by a post-processing stage, but the ultimate goal we pursue is a pure HMM-based architecture capable of performing the endpointing task. The features used for voice activity detection are energy and zero crossing rate, together with AMDF (Average Magnitude Difference Function), which proves to be a valid alternative to energy; further, we study the impact on performance of grammar structures and training conditions. In the end, we set the basis for the investigation of pure HMM-based architectures
Maximum Likelihood Endpoint Detection with Time-Domain Features
Orlandi, Marco;Santarelli, Alfiero;Falavigna, Giuseppe Daniele
2003-01-01
Abstract
In this paper we propose an effective, robust and computationally low-cost HMM-based start-endpoint detector for speech recognisers\footnote [this work has been partially finded by the European project HOMEY: IST-2001-32434]. Our first attempts follow the classical scheme feature estractor-Viterbi classifier (used for voice activity detection), followed by a post-processing stage, but the ultimate goal we pursue is a pure HMM-based architecture capable of performing the endpointing task. The features used for voice activity detection are energy and zero crossing rate, together with AMDF (Average Magnitude Difference Function), which proves to be a valid alternative to energy; further, we study the impact on performance of grammar structures and training conditions. In the end, we set the basis for the investigation of pure HMM-based architecturesI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.