Both research and commercial machine trans- lation have so far neglected the importance of properly handling the spelling, lexical and grammar divergences occurring among lan- guage varieties. Notable cases are standard national varieties such as Brazilian and Euro- pean Portuguese, and Canadian and European French, which popular online machine transla- tion services are not keeping distinct. We show that an evident side effect of modeling such va- rieties as unique classes is the generation of inconsistent translations. In this work, we in- vestigate the problem of training neural ma- chine translation from English to specific pairs of language varieties, assuming both labeled and unlabeled parallel texts, and low-resource conditions. We report experiments from En- glish to two pairs of dialects, European- Brazilian Portuguese and European-Canadian French, and two pairs of standardized vari- eties, Croatian-Serbian and Indonesian-Malay. We show significant BLEU score improve- ments over baseline systems when translation into similar languages is learned as a multilin- gual task with shared representations.
|Titolo:||Neural Machine Translation into Language Varieties|
Lakew, Surafel Melaku (Corresponding)
|Data di pubblicazione:||2018|
|Appare nelle tipologie:||4.1 Contributo in Atti di convegno|