We investigate different strategies for combining quality estimation (QE) and automatic post- editing (APE) to improve the output of machine translation (MT) systems. The joint contribution of the two technologies is analyzed in different settings, in which QE serves as either: i) an activator of APE corrections, or ii) a guidance to APE corrections, or iii) a selector of the final output to be returned to the user. In the first case (QE as activator), sentence-level predictions on the raw MT output quality are used to trigger its automatic correction when the estimated (TER) scores are below a certain threshold. In the second case (QE as guidance), word-level binary quality predictions (“good”/“bad”) are used to inform APE about problematic words in the MT output that should be corrected. In the last case (QE as selector), both sentence- and word-level quality predictions are used to identify the most accurate translation between the original MT output and its post-edited version. For the sake of comparison, the underlying APE technologies explored in our evaluation are both phrase-based and neural. Experiments are carried out on the English-German data used for the QE/APE shared tasks organized within the First Conference on Machine Translation (WMT 2016). Our evaluation shows positive but mixed results, with higher performance observed when word-level QE is used as a selector for neural APE applied to the output of a phrase-based MT system. Overall, our findings motivate further investigation on QE technologies. By reducing the gap between the performance of current solutions and “oracle” results, QE could significantly add to competitive APE technologies.

Combining Quality Estimation and Automatic Post-editing to Enhance Machine Translation Output

Rajen Chatterjee;Matteo Negri;Marco Turchi;
2018

Abstract

We investigate different strategies for combining quality estimation (QE) and automatic post- editing (APE) to improve the output of machine translation (MT) systems. The joint contribution of the two technologies is analyzed in different settings, in which QE serves as either: i) an activator of APE corrections, or ii) a guidance to APE corrections, or iii) a selector of the final output to be returned to the user. In the first case (QE as activator), sentence-level predictions on the raw MT output quality are used to trigger its automatic correction when the estimated (TER) scores are below a certain threshold. In the second case (QE as guidance), word-level binary quality predictions (“good”/“bad”) are used to inform APE about problematic words in the MT output that should be corrected. In the last case (QE as selector), both sentence- and word-level quality predictions are used to identify the most accurate translation between the original MT output and its post-edited version. For the sake of comparison, the underlying APE technologies explored in our evaluation are both phrase-based and neural. Experiments are carried out on the English-German data used for the QE/APE shared tasks organized within the First Conference on Machine Translation (WMT 2016). Our evaluation shows positive but mixed results, with higher performance observed when word-level QE is used as a selector for neural APE applied to the output of a phrase-based MT system. Overall, our findings motivate further investigation on QE technologies. By reducing the gap between the performance of current solutions and “oracle” results, QE could significantly add to competitive APE technologies.
File in questo prodotto:
File Dimensione Formato  
AMTA_2018_Research_Track_v1.pdf - AMTA_2018_Proceedings_Research_Track.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: Creative commons
Dimensione 4.3 MB
Formato Adobe PDF
4.3 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/314427
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact