The problem of language model adaptation in statistical machine translation is considered. A mixture of language models is employed, which is obtained by clustering the bilingual training data. Unsupervised clustering is guided by either the development or the test set. Different mixture weight estimation schemes are proposed and compared, at the level of either single or all source sentences. Experimental results show that, by training different specific language models weighted according to the actual input instead of using a single target language model, translation quality is improved, as measured by BLEU and TER.

Online Language Model adaptation via N-gram Mixtures for Statistical Machine Translation

Cettolo, Mauro
2010-01-01

Abstract

The problem of language model adaptation in statistical machine translation is considered. A mixture of language models is employed, which is obtained by clustering the bilingual training data. Unsupervised clustering is guided by either the development or the test set. Different mixture weight estimation schemes are proposed and compared, at the level of either single or all source sentences. Experimental results show that, by training different specific language models weighted according to the actual input instead of using a single target language model, translation quality is improved, as measured by BLEU and TER.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11582/14068
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact