We address the problem of estimating the quality of Automatic Speech Recognition (ASR) output at utterance level, without recourse to manual reference transcriptions and when information about system’s confidence is not accessible. Given a source signal and its automatic transcription, we approach this problem as a regression task where the word error rate of the transcribed utterance has to be predicted. To this aim, we explore the contribution of different feature sets and the potential of different algorithms in testing conditions of increasing complexity. Results show that our automatic quality estimates closely approximate the word error rate scores calculated over reference transcripts, outperforming a strong baseline in all the testing conditions.
Quality Estimation for Automatic Speech Recognition
Negri, Matteo;Turchi, Marco;Falavigna, Giuseppe Daniele
2014-01-01
Abstract
We address the problem of estimating the quality of Automatic Speech Recognition (ASR) output at utterance level, without recourse to manual reference transcriptions and when information about system’s confidence is not accessible. Given a source signal and its automatic transcription, we approach this problem as a regression task where the word error rate of the transcribed utterance has to be predicted. To this aim, we explore the contribution of different feature sets and the potential of different algorithms in testing conditions of increasing complexity. Results show that our automatic quality estimates closely approximate the word error rate scores calculated over reference transcripts, outperforming a strong baseline in all the testing conditions.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.