Using Prosodic Information for Disambiguation Purposes

Gretter, Roberto; Seppi, Dino

In this work, we describe how prosodic information can be employed to improve the performance of an Automatic Speech Recognizer (ASR) for specific restricted tasks. The approach exploits additional prosodic information in a post-processing stage. Prosodic features are estimated at word level; this additional information is encoded through a feature extractor and is then modeled using a statistical classifier. To train and test this system we collected an Italian database designed to comprise specific dialogue problems like ambiguous utterances. The proposed system yields a 69.5% relative word error rate reduction compared to a traditional state-of-the-art recognizer for the task of recognizing sequences of numbers.