This paper presents a novel method to suggest long word reorderings to a phrase-based SMT decoder. We address language pairs where long reordering concentrates on few patterns, and use fuzzy chunk-based rules to predict likely reorderings for these phenomena. Then we use reordered n-gram LMs to rank the re- sulting permutations and select the n-best for translation. Finally we encode these reorder- ings by modifying selected entries of the dis- tortion cost matrix, on a per-sentence basis. In this way, we expand the search space by a much finer degree than if we simply raised the distortion limit. The proposed techniques are tested on Arabic-English and German-English using well-known SMT benchmarks.
Modified Distortion Matrices for Phrase-Based Statistical Machine Translation
Bisazza, Arianna;Federico, Marcello
2012-01-01
Abstract
This paper presents a novel method to suggest long word reorderings to a phrase-based SMT decoder. We address language pairs where long reordering concentrates on few patterns, and use fuzzy chunk-based rules to predict likely reorderings for these phenomena. Then we use reordered n-gram LMs to rank the re- sulting permutations and select the n-best for translation. Finally we encode these reorder- ings by modifying selected entries of the dis- tortion cost matrix, on a per-sentence basis. In this way, we expand the search space by a much finer degree than if we simply raised the distortion limit. The proposed techniques are tested on Arabic-English and German-English using well-known SMT benchmarks.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.