IRIS Institutional Research Information System

Error propagation from automatic speech recognition (ASR) to machine translation (MT) is a critical issue for the (still) dominant cascade approach to speech translation. To robustify MT to ill-formed inputs, we propose a technique to artificially corrupt clean transcripts so as to emulate noisy automatic transcripts. Our Lexical Noise model relies on estimating from ASR data: i) the probability distribution of the possible edit operations applicable to each word, and ii) the probability distribution of possible lexical substitutes for that word. Corrupted data generated from these probabilities are paired with their original clean counterpart for MT adaptation via fine-tuning. Contrastive experiments on three language pairs led to three main findings. First, on noisy transcripts, the adapted models outperform MT systems fine-tuned on synthetic data corrupted with previous noising techniques, approaching the upper bound performance obtained by fine-tuning on real ASR data. Second, the increased robustness does not come at the cost of performance drops on clean test data. Third, and crucial from the application standpoint, our approach is domain/ASR-independent: noising patterns learned from a given ASR system in a certain domain can be successfully applied to robustify MT to errors made by other ASR systems in a different domain.

Lexical Modeling of ASR Errors for Robust Speech Translation

Giuseppe Martucci;Mauro Cettolo;Matteo Negri;Marco Turchi

2021-01-01

Abstract

Error propagation from automatic speech recognition (ASR) to machine translation (MT) is a critical issue for the (still) dominant cascade approach to speech translation. To robustify MT to ill-formed inputs, we propose a technique to artificially corrupt clean transcripts so as to emulate noisy automatic transcripts. Our Lexical Noise model relies on estimating from ASR data: i) the probability distribution of the possible edit operations applicable to each word, and ii) the probability distribution of possible lexical substitutes for that word. Corrupted data generated from these probabilities are paired with their original clean counterpart for MT adaptation via fine-tuning. Contrastive experiments on three language pairs led to three main findings. First, on noisy transcripts, the adapted models outperform MT systems fine-tuned on synthetic data corrupted with previous noising techniques, approaching the upper bound performance obtained by fine-tuning on real ASR data. Second, the increased robustness does not come at the cost of performance drops on clean test data. Third, and crucial from the application standpoint, our approach is domain/ASR-independent: noising patterns learned from a given ASR system in a certain domain can be successfully applied to robustify MT to errors made by other ASR systems in a different domain.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2021

Appare nelle tipologie:

4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
martucci21_interspeech.pdf accesso aperto Tipologia: Documento in Post-print Licenza: DRM non definito Dimensione 284.09 kB Formato Adobe PDF Visualizza/Apri	284.09 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/330792

Citazioni

ND

social impact