Both research and commercial machine trans- lation have so far neglected the importance of properly handling the spelling, lexical and grammar divergences occurring among lan- guage varieties. Notable cases are standard national varieties such as Brazilian and Euro- pean Portuguese, and Canadian and European French, which popular online machine transla- tion services are not keeping distinct. We show that an evident side effect of modeling such va- rieties as unique classes is the generation of inconsistent translations. In this work, we in- vestigate the problem of training neural ma- chine translation from English to specific pairs of language varieties, assuming both labeled and unlabeled parallel texts, and low-resource conditions. We report experiments from En- glish to two pairs of dialects, European- Brazilian Portuguese and European-Canadian French, and two pairs of standardized vari- eties, Croatian-Serbian and Indonesian-Malay. We show significant BLEU score improve- ments over baseline systems when translation into similar languages is learned as a multilin- gual task with shared representations.
Neural Machine Translation into Language Varieties
S. M. Lakew
;M. Federico
2018-01-01
Abstract
Both research and commercial machine trans- lation have so far neglected the importance of properly handling the spelling, lexical and grammar divergences occurring among lan- guage varieties. Notable cases are standard national varieties such as Brazilian and Euro- pean Portuguese, and Canadian and European French, which popular online machine transla- tion services are not keeping distinct. We show that an evident side effect of modeling such va- rieties as unique classes is the generation of inconsistent translations. In this work, we in- vestigate the problem of training neural ma- chine translation from English to specific pairs of language varieties, assuming both labeled and unlabeled parallel texts, and low-resource conditions. We report experiments from En- glish to two pairs of dialects, European- Brazilian Portuguese and European-Canadian French, and two pairs of standardized vari- eties, Croatian-Serbian and Indonesian-Malay. We show significant BLEU score improve- ments over baseline systems when translation into similar languages is learned as a multilin- gual task with shared representations.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.