Both research and commercial machine trans- lation have so far neglected the importance of properly handling the spelling, lexical and grammar divergences occurring among lan- guage varieties. Notable cases are standard national varieties such as Brazilian and Euro- pean Portuguese, and Canadian and European French, which popular online machine transla- tion services are not keeping distinct. We show that an evident side effect of modeling such va- rieties as unique classes is the generation of inconsistent translations. In this work, we in- vestigate the problem of training neural ma- chine translation from English to specific pairs of language varieties, assuming both labeled and unlabeled parallel texts, and low-resource conditions. We report experiments from En- glish to two pairs of dialects, European- Brazilian Portuguese and European-Canadian French, and two pairs of standardized vari- eties, Croatian-Serbian and Indonesian-Malay. We show significant BLEU score improve- ments over baseline systems when translation into similar languages is learned as a multilin- gual task with shared representations.

Neural Machine Translation into Language Varieties

S. M. Lakew
;
M. Federico
2018

Abstract

Both research and commercial machine trans- lation have so far neglected the importance of properly handling the spelling, lexical and grammar divergences occurring among lan- guage varieties. Notable cases are standard national varieties such as Brazilian and Euro- pean Portuguese, and Canadian and European French, which popular online machine transla- tion services are not keeping distinct. We show that an evident side effect of modeling such va- rieties as unique classes is the generation of inconsistent translations. In this work, we in- vestigate the problem of training neural ma- chine translation from English to specific pairs of language varieties, assuming both labeled and unlabeled parallel texts, and low-resource conditions. We report experiments from En- glish to two pairs of dialects, European- Brazilian Portuguese and European-Canadian French, and two pairs of standardized vari- eties, Croatian-Serbian and Indonesian-Malay. We show significant BLEU score improve- ments over baseline systems when translation into similar languages is learned as a multilin- gual task with shared representations.
978-1-948087-81-0
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11582/316279
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact