Is a Query Worth Translating: Ask the Users!

Users in many regions of the world are multilingual and they issue similar queries in different languages. Given a source language query, we propose query picking which involves finding equivalent target language queries in a large query log. Query picking treats translation as a search problem, and can serve as a translation method in the context of cross-language and multilingual search. Further, given that users usually issue queries when they think they can find relevant content, the success of query picking can serve as a strong indicator to the projected success of cross-language and multilingual search. In this paper we describe a system that performs query picking and we show that picked queries yield results that are statistically indistinguishable from a monolingual baseline. Further, using query picking to predict the effectiveness of cross-language results can have statistically significant effect on the success of multilingual search with improvements over a monolingual baseline. Multilingual merging methods that do not account for the success of query picking can often hurt retrieval effectiveness.

[1]  Douglas W. Oard,et al.  Dictionary-based techniques for cross-language information retrieval , 2005, Inf. Process. Manag..

[2]  Luo Si,et al.  CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists , 2005, CLEF.

[3]  Pu-Jen Cheng,et al.  To translate or not to translate? , 2010, SIGIR.

[4]  Jianqiang Wang,et al.  Combining bidirectional translation and synonymy for cross-language information retrieval , 2006, SIGIR.

[5]  Kazuaki Kishida Prediction of performance of cross-language information retrieval using automatic evaluation of translation , 2008 .

[6]  Ari Pirkola,et al.  The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval , 1998, SIGIR '98.

[7]  Ellen M. Voorhees,et al.  Evaluation by highly relevant documents , 2001, SIGIR '01.

[8]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[9]  Fredric C. Gey,et al.  Accessing Multilingual Information Repositories, 6th Workshop of the Cross-Language Evalution Forum, CLEF 2005, Vienna, Austria, 21-23 September, 2005, Revised Selected Papers , 2006, CLEF.

[10]  W. Bruce Croft,et al.  Combining the language model and inference network approaches to retrieval , 2004, Inf. Process. Manag..

[11]  Wei Gao,et al.  Exploiting query logs for cross-lingual query suggestions , 2010, TOIS.

[12]  Ossama Emam,et al.  Language Model Based Arabic Word Segmentation , 2003, ACL.

[13]  K. Saravanan,et al.  "They Are Out There, If You Know Where to Look": Mining Transliterations of OOV Query Terms for Cross-Language Information Retrieval , 2009, ECIR.

[14]  Wei Gao,et al.  Cross-lingual query suggestion using query logs of different languages , 2007, SIGIR.

[15]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[16]  Wei Gao,et al.  Joint Ranking for Multilingual Web Search , 2009, ECIR.

[17]  Heng Ji,et al.  A study of using an out-of-box commercial MT system for query translation in CLIR , 2008, iNEWS '08.

[18]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[19]  Julio Gonzalo,et al.  Advances in Cross-Language Information Retrieval , 2002, Lecture Notes in Computer Science.

[20]  Hsin-Hsi Chen,et al.  A study of learning a merge model for multilingual information retrieval , 2008, SIGIR '08.

[21]  Xiaodong He Using Word-Dependent Transition Models in HMM-Based Word Alignment for Statistical Machine Translation , 2007, WMT@ACL.

[22]  Kareem Darwish,et al.  Transliteration Mining with Phonetic Conflation and Iterative Training , 2010, NEWS@ACL.

[23]  Roi Blanco,et al.  Probabilistic static pruning of inverted files , 2010, TOIS.

[24]  Hsin-Hsi Chen,et al.  Merging Mechanisms in Multilingual Information Retrieval , 2002, CLEF.