IRIS Institutional Research Information System

In simultaneous speech translation (SimulST), finding the best trade-off between high output quality and low latency is a challenging task. To meet the latency constraints posed by different application scenarios, multiple dedicated SimulST models are usually trained and maintained, generating high computational costs. In this paper, also motivated by the increased sensitivity towards sustainable AI, we investigate whether a single model trained offline can serve both offline and simultaneous applications under different latency regimes without additional training or adaptation. Experiments on en→{de, es} show that, aside from facilitating the adoption of well-established offline architectures and training strategies without affecting latency, the offline solution achieves similar or better quality compared to the standard SimulST training protocol, also being competitive with the state-of-the-art system.

Does Simultaneous Speech Translation need Simultaneous Models?

Sara Papi;Marco Gaido;Matteo Negri;Marco Turchi

2022-01-01

Abstract

In simultaneous speech translation (SimulST), finding the best trade-off between high output quality and low latency is a challenging task. To meet the latency constraints posed by different application scenarios, multiple dedicated SimulST models are usually trained and maintained, generating high computational costs. In this paper, also motivated by the increased sensitivity towards sustainable AI, we investigate whether a single model trained offline can serve both offline and simultaneous applications under different latency regimes without additional training or adaptation. Experiments on en→{de, es} show that, aside from facilitating the adoption of well-established offline architectures and training strategies without affecting latency, the offline solution achieves similar or better quality compared to the standard SimulST training protocol, also being competitive with the state-of-the-art system.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2022

Appare nelle tipologie:

4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Does_Simultaneous_Speech_Translation_need_Simultaneous_Models_.pdf accesso aperto Tipologia: Documento in Pre-print Licenza: Creative commons Dimensione 463.54 kB Formato Adobe PDF Visualizza/Apri	463.54 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/335846

Citazioni

ND

social impact