Statistical Cross-Language Information Retrieval using N-Best Query Translations

Federico, Marcello; Bertoldi, Nicola

This paper presents a novel statistical model for cross-language information retrieval. Given a written query in the source language, documents in the target language are ranked by integrating probabilities computed by two statistical models: a query-translation model, which generates most probable term-by-term translations of the query, and a query-document model, which evaluates the likelihood of each document and translation. Integration of the two scores is performed over the set of N most probable translations of the query. Experimental results with values N=1,5,10 are presented on the Italian-English bilingual track data used in the CLEF 2000 and 2001 evaluation campaigns