This work extends phrase-based statistical MT (SMT) with shallow syntax dependencies. Two string-to-chunks translation models are proposed: a factored model, which augments phrase-based SMT with layered dependencies, and a joint model, that extends the phrase translation table with microtags, i.e. per-word projections of chunk labels. Both rely on $n$-gram models of target sequences with different granularity: single words, microtags, chunks. In particular, $n$-grams defined over syntactic chunks should model syntactic constraints coping with word-group movements. Experimental analysis and evaluation conducted on two popular Chinese-English tasks suggest that the shallow-syntax joint-translation model has potential to outperform state-of-the-art phrase-based translation, with a reasonable computational overhead.
Shallow-Syntax Phrase-Based Translation: Joint versus Factored String-to-Chunk Models
Cettolo, Mauro;Federico, Marcello;Pighin, Daniele;Bertoldi, Nicola
2008-01-01
Abstract
This work extends phrase-based statistical MT (SMT) with shallow syntax dependencies. Two string-to-chunks translation models are proposed: a factored model, which augments phrase-based SMT with layered dependencies, and a joint model, that extends the phrase translation table with microtags, i.e. per-word projections of chunk labels. Both rely on $n$-gram models of target sequences with different granularity: single words, microtags, chunks. In particular, $n$-grams defined over syntactic chunks should model syntactic constraints coping with word-group movements. Experimental analysis and evaluation conducted on two popular Chinese-English tasks suggest that the shallow-syntax joint-translation model has potential to outperform state-of-the-art phrase-based translation, with a reasonable computational overhead.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.