FBK @ IWSLT 2014 - ASR track

Babaali, Bagher; Serizel, Romain Herve' Jacques; Jalalvand, Shahab; Gretter, Roberto; Falavigna, Giuseppe Daniele; Giuliani, Diego

This paper reports on the participation of FBK in the IWSLT 2014 evaluation campaign for Automatic Speech Recognition (ASR), which focused on the transcription of TED talks. The outputs of primary and contrastive systems were submitted for three languages, namely English, German and Italian. Most effort went into the development of the English transcription system. The primary system is based on the ROVER combination of the output of 5 transcription sub-systems which are all based on the Deep Neural Network - Hidden Markov Model (DNN-HMM) hybrid. Before combination, word lattices generated by each sub-system are rescored using an efficient interpolation of 4-gram and Recurrent Neural Network (RNN) language models. The primary system achieves a Word Error Rate (WER) of 14.7% and 11.4% on the 2013 and 2014 official IWSLT English test sets, respectively. The subspace Gaussian mixture model (SGMM) system developed for German achieves 39.5% WER on the 2014 IWSLT German test sets. For Italian, the primary transcription system was based on hidden Markov models and achieves 23.8% WER on the 2014 IWSLT Italian test set.