This paper presents a novel statistical model for cross-language information retrieval. Given a written query in the source language, documents in the target language are ranked by integrating probabilities computed by two statistical models: a query-translation model, which generates most probable term-by-term translations of the query, and a query-document model, which evaluates the likelihood of each document and translation. Integration of the two scores is performed over the set of N most probable translations of the query. Experimental results with values N=1,5,10 are presented on the Italian-English bilingual track data used in the CLEF 2000 and 2001 evaluation campaigns
Statistical Cross-Language Information Retrieval using N-Best Query Translations
Federico, Marcello;Bertoldi, Nicola
2002-01-01
Abstract
This paper presents a novel statistical model for cross-language information retrieval. Given a written query in the source language, documents in the target language are ranked by integrating probabilities computed by two statistical models: a query-translation model, which generates most probable term-by-term translations of the query, and a query-document model, which evaluates the likelihood of each document and translation. Integration of the two scores is performed over the set of N most probable translations of the query. Experimental results with values N=1,5,10 are presented on the Italian-English bilingual track data used in the CLEF 2000 and 2001 evaluation campaignsI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.