We investigate adaptive machine translation (MT) as a way to reduce human workload and enhance user experience when professional translators operate in real-life conditions. A crucial aspect in our analysis is how to ensure a reliable assessment of MT technologies aimed to support human post-editing. We pay particular attention to two evaluation aspects: i) the design of a sound experimental protocol to reduce the risk of collecting biased measurements, and ii) the use of robust statistical testing methods (linear mixed-effects models) to reduce the risk of under/over-estimating the observed variations. Our adaptive MT technology is integrated in a web-based full-fledged computer-assisted translation (CAT) tool. We report on a post-editing field test that involved 16 professional translators working on two translation directions (English-Italian and English-French), with texts coming from two linguistic domains (legal, information technology). Our contrastive experiments compare user post-editing effort with static vs. adaptive MT in an end-to-end scenario where the system is evaluated as a whole. Our results evidence that adaptive MT leads to an overall reduction in post-editing effort (HTER) up to 10.6% (p < 0.05). A follow-up manual evaluation of the MT outputs and their corresponding post-edits confirms that the gain in HTER corresponds to higher quality of the adaptive MT system and does not come at the expense of the final human translation quality. Indeed, adaptive MT shows to return better suggestions than static MT (p < 0.01), and the resulting post-edits do not significantly differ in the two conditions.
On the Evaluation of Adaptive Machine Translation for Human Post-Editing
Bentivogli, Luisa;Bertoldi, Nicola;Cettolo, Mauro;Federico, Marcello;Negri, Matteo;Turchi, Marco
2016-01-01
Abstract
We investigate adaptive machine translation (MT) as a way to reduce human workload and enhance user experience when professional translators operate in real-life conditions. A crucial aspect in our analysis is how to ensure a reliable assessment of MT technologies aimed to support human post-editing. We pay particular attention to two evaluation aspects: i) the design of a sound experimental protocol to reduce the risk of collecting biased measurements, and ii) the use of robust statistical testing methods (linear mixed-effects models) to reduce the risk of under/over-estimating the observed variations. Our adaptive MT technology is integrated in a web-based full-fledged computer-assisted translation (CAT) tool. We report on a post-editing field test that involved 16 professional translators working on two translation directions (English-Italian and English-French), with texts coming from two linguistic domains (legal, information technology). Our contrastive experiments compare user post-editing effort with static vs. adaptive MT in an end-to-end scenario where the system is evaluated as a whole. Our results evidence that adaptive MT leads to an overall reduction in post-editing effort (HTER) up to 10.6% (p < 0.05). A follow-up manual evaluation of the MT outputs and their corresponding post-edits confirms that the gain in HTER corresponds to higher quality of the adaptive MT system and does not come at the expense of the final human translation quality. Indeed, adaptive MT shows to return better suggestions than static MT (p < 0.01), and the resulting post-edits do not significantly differ in the two conditions.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.