Dynamic Language Model Focusing for Automatic Transcription of Talk-Show TV Programs

Falavigna, Giuseppe Daniele; Gretter, Roberto; Brugnara, Fabio; Giuliani, Diego

In this paper, an approach for unsupervised dynamic adaptation of the language model used in an automatic transcription task is proposed. The approach aims to build language models ”focused” on linguistic content and speaking style of audio documents to transcribe by adapting a general purpose language model on a running window of text derived from automatic recognition hypotheses. The text in each window is used to automatically select documents from the same corpus utilized for training the general purpose language model. In particular, a fast selection approach has been developed and compared with a more traditional one used in the information retrieval area. The new proposed approach allows for a real time selection of documents and, hence, for a frequent language model adaptation on a short (less than 100 words) window of text. Experiments have been carried out on six episodes of two Italian TV talk-shows programs, by varying the size and advancement step of the running window and the corresponding number of words selected for focusing language models. A relative reduction in word error rate of about 5.0% has been obtained using the running window for focusing the language models, to be compared with a corresponding relative reduction of 3.5% achieved using the whole automatic transcription of each talk-show episode for focusing the language models.