This paper presents the first achievements in the development of a broadcast news transcription system to be applied for the processing of huge audio archives. In paricular, the Italian broadcast news corpus under collection is introduced, and the first implemented baseline system is outlined. The baseline system consists of an audio segmentation module and a speech recognizer featuring a recursive Viterbi beam search, a 64K-word lexicon, a tree-based trigram LM representation, and MLLR adaptation. The word error rate of the baseline was 20.9% on planned studio speech and 28.8% on the whole test set
A Baseline for the Transcription of Italian Broadcast News
Brugnara, Fabio;Cettolo, Mauro;Federico, Marcello;Giuliani, Diego
2000-01-01
Abstract
This paper presents the first achievements in the development of a broadcast news transcription system to be applied for the processing of huge audio archives. In paricular, the Italian broadcast news corpus under collection is introduced, and the first implemented baseline system is outlined. The baseline system consists of an audio segmentation module and a speech recognizer featuring a recursive Viterbi beam search, a 64K-word lexicon, a tree-based trigram LM representation, and MLLR adaptation. The word error rate of the baseline was 20.9% on planned studio speech and 28.8% on the whole test setFile in questo prodotto:
Non ci sono file associati a questo prodotto.
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.