WikiTranslate: Query Translation for Cross-lingual Information Retrieval using only Wikipedia

This paper presents WikiTranslate, a system which performs query translation for cross-lingual information retrieval (CLIR) using only Wikipedia to obtain translations. Queries are mapped to Wikipedia concepts and the corresponding translations of these concepts in the target language are used to create the final query. WikiTranslate is evaluated by searching with topics formulated in Dutch, French and Spanish in an English data collection. The system achieved a performance of 67% compared to the monolingual baseline.

[1]  Jean Paul Ballerini,et al.  Experiments in multilingual information retrieval using the SPIDER system , 1996, SIGIR '96.

[2]  W. Bruce Croft,et al.  Cross-lingual relevance models , 2002, SIGIR '02.

[3]  Maarten de Rijke,et al.  Finding Similar Sentences across Multiple Languages in Wikipedia , 2006 .

[4]  Carol Peters,et al.  CLEF 2008: Ad Hoc Track Overview , 2008, CLEF.

[5]  W. Bruce Croft,et al.  Resolving ambiguity for cross-language retrieval , 1998, SIGIR '98.

[6]  Max Mühlhäuser,et al.  Analyzing and accessing Wikipedia as a lexical semantic resource , 2007 .

[7]  Piek Vossen,et al.  EuroWordNet: a multilingual database for information retrieval , 1997 .

[8]  Arantxa Otegi,et al.  CLEF 2009 Ad Hoc Track Overview: Robust - WSD Task , 2009, CLEF.

[9]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[10]  Bogdan Sacaleanu,et al.  Working Notes for the CLEF 2008 Workshop , 2008 .

[11]  James Mayfield,et al.  Comparing cross-language query expansion techniques by degrading translation resources , 2002, SIGIR '02.

[12]  Derek Lackaff,et al.  An Analysis of Topical Coverage of Wikipedia , 2008, J. Comput. Mediat. Commun..

[13]  Benno Stein,et al.  A Wikipedia-Based Multilingual Retrieval Model , 2008, ECIR.

[14]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[15]  Shih-Hung Wu,et al.  Using Wikipedia to Translate OOV Term on MLIR , 2007, NTCIR.

[16]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[17]  András A. Benczúr,et al.  Performing Cross-Language Retrieval with Wikipedia , 2007, CLEF.

[18]  Mark Sanderson,et al.  Word sense disambiguation and information retrieval , 1994, SIGIR '94.

[19]  M. de Rijke,et al.  Monolingual Document Retrieval for European Languages , 2004, Information Retrieval.

[20]  J. Voß Measuring Wikipedia , 2005 .

[21]  Wessel Kraaij,et al.  Embedding Web-Based Statistical Translation Models in Cross-Language Information Retrieval , 2003, CL.

[22]  Rada Mihalcea,et al.  Using Wikipedia for Automatic Word Sense Disambiguation , 2007, NAACL.