Translation capability of a Phrase–Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efflciently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En–Fr and Fr–En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out–of–vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.

Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources

Turchi, Marco;
2011-01-01

Abstract

Translation capability of a Phrase–Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efflciently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En–Fr and Fr–En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out–of–vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/307933
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact