Within the field of statistical machine translation, the neural approach (NMT) is currently pushing ahead the state of the art performance traditionally achieved by phrase-based approaches (PBMT), and is rapidly becoming the dominant technology in machine translation. Indeed, in the last IWSLT and WMT evaluation campaigns on machine translation, NMT outperformed well established state-of-the-art PBMT systems on many different language pairs. To understand in what respects NMT provides better translation quality than PBMT, we perform a detailed analysis of neural versus phrase-based statistical machine translation outputs, leveraging high quality post-edits performed by professional translators on the IWSLT data. In this analysis, we focus on two language directions with different characteristics: English–German, known to be particularly hard because of morphology and syntactic differences, and English–French, where PBMT systems typically reach outstanding quality and thus represent a strong competitor for NMT. Our analysis provides useful insights on what linguistic phenomena are best modelled by neural models – such as the reordering of verbs and nouns – while pointing out other aspects that remain to be improved – like the correct translation of proper nouns.

Neural versus phrase-based MT quality: An in-depth analysis on English–German and English–French

Luisa Bentivogli
;
Arianna Bisazza;Mauro Cettolo;Marcello Federico
2018-01-01

Abstract

Within the field of statistical machine translation, the neural approach (NMT) is currently pushing ahead the state of the art performance traditionally achieved by phrase-based approaches (PBMT), and is rapidly becoming the dominant technology in machine translation. Indeed, in the last IWSLT and WMT evaluation campaigns on machine translation, NMT outperformed well established state-of-the-art PBMT systems on many different language pairs. To understand in what respects NMT provides better translation quality than PBMT, we perform a detailed analysis of neural versus phrase-based statistical machine translation outputs, leveraging high quality post-edits performed by professional translators on the IWSLT data. In this analysis, we focus on two language directions with different characteristics: English–German, known to be particularly hard because of morphology and syntactic differences, and English–French, where PBMT systems typically reach outstanding quality and thus represent a strong competitor for NMT. Our analysis provides useful insights on what linguistic phenomena are best modelled by neural models – such as the reordering of verbs and nouns – while pointing out other aspects that remain to be improved – like the correct translation of proper nouns.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/312800
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact