We present a two stage automatic speech recognition architecture suited for applications, such asspoken document retrieval, where large scale language models can be used and very lowout-of-vocabulary rates need to be reached. The proposed system couples a weakly constrainedphone-recognizer with a phone-to-word decoder that was originally developed for phrase-basedstatistical machine translation. The decoder permits to efficiently decode confusion networks in input,and to exploit large scale unpruned language models. Preliminary experiments are reported on thetranscription of speeches of the Italian parliament. The use of phone confusion networks as interfacebetween the two decoding steps permits to reduce the WER by 28%, thus making the system performrelatively close to a state-of-the-art baseline using a comparable language model.
Fast Speech Decoding through Phone Confusion Networks
Bertoldi, Nicola;Federico, Marcello;Falavigna, Giuseppe Daniele;Gerosa, Matteo
2008-01-01
Abstract
We present a two stage automatic speech recognition architecture suited for applications, such asspoken document retrieval, where large scale language models can be used and very lowout-of-vocabulary rates need to be reached. The proposed system couples a weakly constrainedphone-recognizer with a phone-to-word decoder that was originally developed for phrase-basedstatistical machine translation. The decoder permits to efficiently decode confusion networks in input,and to exploit large scale unpruned language models. Preliminary experiments are reported on thetranscription of speeches of the Italian parliament. The use of phone confusion networks as interfacebetween the two decoding steps permits to reduce the WER by 28%, thus making the system performrelatively close to a state-of-the-art baseline using a comparable language model.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.