Transcribing lectures is a challenging task, both in acoustic and in language modeling. In this work, we present recent results on the automatic transcription of lectures from the Translanguage English Database, which contains the recordings of talks given in English at Eurospeech `93, by mostly non-native speakers. Concerning acoustic modeling, the acoustic model trained for a broadcast news transcription task was adapted on the lectures training data through Maximum Likelihood Linear Regression adaptation, including models of spontaneous speech phenomena. Moreover, a normalization procedure was embodied in the training stage, consisting in a cluster-based mean and variance normalization of the static features. Language modeling was based on adaptation of a background language model estimated on broadcast news transcripts, conference proceedings, lecture transcripts, and conversational speech transcripts. Among the examined adaptation techniques, the most effective one was obtained by exploiting the paper presented in each lecture to be processed. The best transcription performance on a 2 hours test set was 32.4% word error rate

Advances in the Automatic Transcription of Lectures

Cettolo, Mauro;Brugnara, Fabio;Federico, Marcello
2004

Abstract

Transcribing lectures is a challenging task, both in acoustic and in language modeling. In this work, we present recent results on the automatic transcription of lectures from the Translanguage English Database, which contains the recordings of talks given in English at Eurospeech `93, by mostly non-native speakers. Concerning acoustic modeling, the acoustic model trained for a broadcast news transcription task was adapted on the lectures training data through Maximum Likelihood Linear Regression adaptation, including models of spontaneous speech phenomena. Moreover, a normalization procedure was embodied in the training stage, consisting in a cluster-based mean and variance normalization of the static features. Language modeling was based on adaptation of a background language model estimated on broadcast news transcripts, conference proceedings, lecture transcripts, and conversational speech transcripts. Among the examined adaptation techniques, the most effective one was obtained by exploiting the paper presented in each lecture to be processed. The best transcription performance on a 2 hours test set was 32.4% word error rate
File in questo prodotto:
File Dimensione Formato  
cetEtAlIcassp2004.pdf

non disponibili

Descrizione: articolo
Tipologia: Documento in Post-print
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 35.48 kB
Formato Adobe PDF
35.48 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11582/2287
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact