Most state-of-the-art automatic transcription systems generate word transcriptions of the incoming audio data through two or more decoding passes interleaved by adaptation of acoustic models. It was proved that better results are obtained when the adaptation procedure exploits a supervision generated by a system different than the one under adaptation. In this paper, cross-system adaptation is investigated by using supervisions generated by several systems built varying the phoneme set and the acoustic front-end. Furthermore, an adaptation procedure is presented that makes use of multiple supervisions of the audio data for adapting the acoustic models within the MLLR framework. The gain achieved with cross-system adaptation and by adapting the acoustic models exploiting multiple, intra-site and cross-site, supervisions is demonstrated on the English European parliamentary speeches task.
Experiments on Cross-System Acoustic Model Adaptation
Giuliani, Diego;Brugnara, Fabio
2007-01-01
Abstract
Most state-of-the-art automatic transcription systems generate word transcriptions of the incoming audio data through two or more decoding passes interleaved by adaptation of acoustic models. It was proved that better results are obtained when the adaptation procedure exploits a supervision generated by a system different than the one under adaptation. In this paper, cross-system adaptation is investigated by using supervisions generated by several systems built varying the phoneme set and the acoustic front-end. Furthermore, an adaptation procedure is presented that makes use of multiple supervisions of the audio data for adapting the acoustic models within the MLLR framework. The gain achieved with cross-system adaptation and by adapting the acoustic models exploiting multiple, intra-site and cross-site, supervisions is demonstrated on the English European parliamentary speeches task.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.