In this work, we describe how prosodic information can be employed to improve the performance of an Automatic Speech Recognizer (ASR) for specific restricted tasks. The approach exploits additional prosodic information in a post-processing stage. Prosodic features are estimated at word level; this additional information is encoded through a feature extractor and is then modeled using a statistical classifier. To train and test this system we collected an Italian database designed to comprise specific dialogue problems like ambiguous utterances. The proposed system yields a 69.5% relative word error rate reduction compared to a traditional state-of-the-art recognizer for the task of recognizing sequences of numbers.

Using Prosodic Information for Disambiguation Purposes

Gretter, Roberto;Seppi, Dino
2005-01-01

Abstract

In this work, we describe how prosodic information can be employed to improve the performance of an Automatic Speech Recognizer (ASR) for specific restricted tasks. The approach exploits additional prosodic information in a post-processing stage. Prosodic features are estimated at word level; this additional information is encoded through a feature extractor and is then modeled using a statistical classifier. To train and test this system we collected an Italian database designed to comprise specific dialogue problems like ambiguous utterances. The proposed system yields a 69.5% relative word error rate reduction compared to a traditional state-of-the-art recognizer for the task of recognizing sequences of numbers.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/3351
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact