Error propagation from automatic speech recognition (ASR) to machine translation (MT) is a critical issue for the (still) dominant cascade approach to speech translation. To robustify MT to ill-formed inputs, we propose a technique to artificially corrupt clean transcripts so as to emulate noisy automatic transcripts. Our Lexical Noise model relies on estimating from ASR data: i) the probability distribution of the possible edit operations applicable to each word, and ii) the probability distribution of possible lexical substitutes for that word. Corrupted data generated from these probabilities are paired with their original clean counterpart for MT adaptation via fine-tuning. Contrastive experiments on three language pairs led to three main findings. First, on noisy transcripts, the adapted models outperform MT systems fine-tuned on synthetic data corrupted with previous noising techniques, approaching the upper bound performance obtained by fine-tuning on real ASR data. Second, the increased robustness does not come at the cost of performance drops on clean test data. Third, and crucial from the application standpoint, our approach is domain/ASR-independent: noising patterns learned from a given ASR system in a certain domain can be successfully applied to robustify MT to errors made by other ASR systems in a different domain.

Lexical Modeling of ASR Errors for Robust Speech Translation

Mauro Cettolo;Matteo Negri;Marco Turchi
2021-01-01

Abstract

Error propagation from automatic speech recognition (ASR) to machine translation (MT) is a critical issue for the (still) dominant cascade approach to speech translation. To robustify MT to ill-formed inputs, we propose a technique to artificially corrupt clean transcripts so as to emulate noisy automatic transcripts. Our Lexical Noise model relies on estimating from ASR data: i) the probability distribution of the possible edit operations applicable to each word, and ii) the probability distribution of possible lexical substitutes for that word. Corrupted data generated from these probabilities are paired with their original clean counterpart for MT adaptation via fine-tuning. Contrastive experiments on three language pairs led to three main findings. First, on noisy transcripts, the adapted models outperform MT systems fine-tuned on synthetic data corrupted with previous noising techniques, approaching the upper bound performance obtained by fine-tuning on real ASR data. Second, the increased robustness does not come at the cost of performance drops on clean test data. Third, and crucial from the application standpoint, our approach is domain/ASR-independent: noising patterns learned from a given ASR system in a certain domain can be successfully applied to robustify MT to errors made by other ASR systems in a different domain.
File in questo prodotto:
File Dimensione Formato  
martucci21_interspeech.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: DRM non definito
Dimensione 284.09 kB
Formato Adobe PDF
284.09 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/330792
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact