IRIS Institutional Research Information System

This paper describes FBK’s submission to the end-to-end speech translation (ST) task at IWSLT 2019. The task consists in the “direct” translation (ie without intermediate discrete representation) of English speech data derived from TED Talks or lectures into German texts. Our participation had a twofold goal: i) testing our latest models, and ii) evaluating the contribution to model training of different data augmentation techniques. On the model side, we deployed our recently proposed S-Transformer with logarithmic distance penalty, an ST-oriented adaptation of the Transformer architecture widely used in machine translation (MT). On the training side, we focused on data augmentation techniques recently proposed for ST and automatic speech recognition (ASR). In particular, we exploited augmented data in different ways and at different stages of the process. We first trained an end-to-end ASR system and used the weights of its encoder to initialize the decoder of our ST model (transfer learning). Then, we used an English-German MT system trained on large data to translate the English side of the English-French training set into German, and used this newly-created data as additional training material. Finally, we trained our models using SpecAugment, an augmentation technique that randomly masks portions of the spectrograms in order to make them different at every training epoch. Our synthetic corpus and SpecAugment resulted in an improvement of 5 BLEU points over our baseline model on the test set of MuST-C En-De, reaching the score of 22.3 with a single end-to-end system.

Data Augmentation for End-to-End Speech Translation: FBK@IWSLT '19

Mattia A. Di Gangi;Matteo Negri;Viet Nhat Nguyen;Amirhossein Tebbifakhr;Marco Turchi

2019-01-01

Abstract

This paper describes FBK’s submission to the end-to-end speech translation (ST) task at IWSLT 2019. The task consists in the “direct” translation (ie without intermediate discrete representation) of English speech data derived from TED Talks or lectures into German texts. Our participation had a twofold goal: i) testing our latest models, and ii) evaluating the contribution to model training of different data augmentation techniques. On the model side, we deployed our recently proposed S-Transformer with logarithmic distance penalty, an ST-oriented adaptation of the Transformer architecture widely used in machine translation (MT). On the training side, we focused on data augmentation techniques recently proposed for ST and automatic speech recognition (ASR). In particular, we exploited augmented data in different ways and at different stages of the process. We first trained an end-to-end ASR system and used the weights of its encoder to initialize the decoder of our ST model (transfer learning). Then, we used an English-German MT system trained on large data to translate the English side of the English-French training set into German, and used this newly-created data as additional training material. Finally, we trained our models using SpecAugment, an augmentation technique that randomly masks portions of the spectrograms in order to make them different at every training epoch. Our synthetic corpus and SpecAugment resulted in an improvement of 5 BLEU points over our baseline model on the test set of MuST-C En-De, reaching the score of 22.3 with a single end-to-end system.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2019

Appare nelle tipologie:

4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
IWSLT2019_paper_29.pdf accesso aperto Licenza: Creative commons Dimensione 219.85 kB Formato Adobe PDF Visualizza/Apri	219.85 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/320853

Citazioni

ND

social impact