Exploiting query logs for cross-lingual query suggestions

Query suggestion aims to suggest relevant queries for a given query, which helps users better specify their information needs. Previous work on query suggestion has been limited to the same language. In this article, we extend it to cross-lingual query suggestion (CLQS): for a query in one language, we suggest similar or relevant queries in other languages. This is very important to the scenarios of cross-language information retrieval (CLIR) and other related cross-lingual applications. Instead of relying on existing query translation technologies for CLQS, we present an effective means to map the input query of one language to queries of the other language in the query log. Important monolingual and cross-lingual information such as word translation relations and word co-occurrence statistics, and so on, are used to estimate the cross-lingual query similarity with a discriminative model. Benchmarks show that the resulting CLQS system significantly outperforms a baseline system that uses dictionary-based query translation. Besides, we evaluate CLQS with French-English and Chinese-English CLIR tasks on TREC-6 and NTCIR-4 collections, respectively. The CLIR experiments using typical retrieval models demonstrate that the CLQS-based approach has significantly higher effectiveness than several traditional query translation methods. We find that when combined with pseudo-relevance feedback, the effectiveness of CLIR using CLQS is enhanced for different pairs of languages.

[1]  Hsi-Jian Lee,et al.  Anchor text mining for translation extraction of query terms , 2001, SIGIR '01.

[2]  Wei-Ying Ma,et al.  Query Expansion by Mining User Logs , 2003, IEEE Trans. Knowl. Data Eng..

[3]  Franz Josef Och,et al.  Statistical machine translation: from single word models to alignment templates , 2002 .

[4]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[5]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[6]  W. Bruce Croft,et al.  Resolving ambiguity for cross-language retrieval , 1998, SIGIR '98.

[7]  Sora Choi,et al.  Rich results from poor resources: NTCIR-4 monolingual and cross-lingual retrieval of korean texts using chinese and english , 2005, TALIP.

[8]  Ji-Rong Wen,et al.  Query clustering using user logs , 2002, TOIS.

[9]  Pu-Jen Cheng,et al.  Translating unknown queries with web corpora for cross-language information retrieval , 2004, SIGIR '04.

[10]  Stephen E. Robertson,et al.  On Term Selection for Query Expansion , 1991, J. Documentation.

[11]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[12]  Sung-Hyon Myaeng,et al.  Using Mutual Information to Resolve Query Translation Ambiguities and Query Term Weighting , 1999, ACL.

[13]  Carol Peters,et al.  Cross-Language Information Retrieval (CLIR) Track Overview , 1997, TREC.

[14]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[15]  Ming Zhou,et al.  A Probabilistic Approach to Syntax-based Reordering for Statistical Machine Translation , 2007, ACL.

[16]  Wei Gao,et al.  Cross-lingual query suggestion using query logs of different languages , 2007, SIGIR.

[17]  Hsin-Hsi Chen,et al.  Novel Association Measures Using Web Search with Double Checking , 2006, ACL.

[18]  Jian-Yun Nie,et al.  Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web , 1999, SIGIR '99.

[19]  Christof Monz,et al.  Iterative translation disambiguation for cross-language information retrieval , 2005, SIGIR '05.

[20]  David A. Hull Using statistical testing in the evaluation of retrieval experiments , 1993, SIGIR.

[21]  Hsin-Hsi Chen,et al.  Overview of CLIR Task at the Fourth NTCIR Workshop , 2004, NTCIR.

[22]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[23]  Changning Huang,et al.  Improving query translation for cross-language information retrieval using statistical models , 2001, SIGIR '01.

[24]  Wessel Kraaij,et al.  Embedding Web-Based Statistical Translation Models in Cross-Language Information Retrieval , 2003, CL.

[25]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[26]  W. Bruce Croft,et al.  Cross-lingual relevance models , 2002, SIGIR '02.

[27]  W. Bruce Croft,et al.  Phrasal translation and query expansion techniques for cross-language information retrieval , 1997, SIGIR '97.

[28]  Julio Gonzalo,et al.  Noun phrases as building blocks for cross-language Search Assistance , 2005, Inf. Process. Manag..

[29]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[30]  W. Bruce Croft,et al.  Finding similar questions in large question and answer archives , 2005, CIKM '05.

[31]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[32]  Ming Zhou,et al.  Measure Word Generation for English-Chinese SMT Systems , 2008, ACL.

[33]  Jianfeng Gao,et al.  Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations , 2002, SIGIR '02.

[34]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[35]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[36]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[37]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[38]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[39]  Charles L. A. Clarke,et al.  Comparing query logs and pseudo-relevance feedbackfor web-search query refinement , 2007, SIGIR.

[40]  Turid Hedlund,et al.  Dictionary-Based Cross-Language Information Retrieval: Problems, Methods, and Research Findings , 2001, Information Retrieval.

[41]  Ying Zhang,et al.  Using the web for automated translation extraction in cross-language information retrieval , 2004, SIGIR '04.

[42]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[43]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[44]  Vamshi Ambati Using Monolingual Clickthrough Data to Build Cross-lingual Search Systems , 2006 .

[45]  David F. Gleich,et al.  SVD Subspace Projections for Term Suggestion Ranking and Clustering , 2004 .

[46]  James Mayfield,et al.  Comparing cross-language query expansion techniques by degrading translation resources , 2002, SIGIR '02.

[47]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[48]  Tetsuya Ishikawa,et al.  Applying Machine Translation to Two-Stage Cross-Language Information , 2000, AMTA.

[49]  Changning Huang,et al.  Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach , 2005, CL.