Assessment of learning to rank methods for query expansion

Pseudo relevance feedback, as an effective query expansion method, can significantly improve information retrieval performance. However, the method may negatively impact the retrieval performance when some irrelevant terms are used in the expanded query. Therefore, it is necessary to refine the expansion terms. Learning to rank methods have proven effective in information retrieval to solve ranking problems by ranking the most relevant documents at the top of the returned list, but few attempts have been made to employ learning to rank methods for term refinement in pseudo relevance feedback. This article proposes a novel framework to explore the feasibility of using learning to rank to optimize pseudo relevance feedback by means of reranking the candidate expansion terms. We investigate some learning approaches to choose the candidate terms and introduce some state‐of‐the‐art learning to rank methods to refine the expansion terms. In addition, we propose two term labeling strategies and examine the usefulness of various term features to optimize the framework. Experimental results with three TREC collections show that our framework can effectively improve retrieval performance.

[1]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[2]  James Allan,et al.  Adapting information retrieval systems to user queries , 2008, Inf. Process. Manag..

[3]  Jussara M. Almeida,et al.  Automatic query expansion based on tag recommendation , 2012, CIKM.

[4]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[5]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[6]  James Allan,et al.  Regression Rank: Learning to Meet the Opportunity of Descriptive Queries , 2009, ECIR.

[7]  ChengXiang Zhai,et al.  A boosting approach to improving pseudo-relevance feedback , 2011, SIGIR.

[8]  Cristina V. Lopes,et al.  Bagging gradient-boosted trees for high precision, low variance ranking models , 2011, SIGIR.

[9]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[10]  Mihai Surdeanu,et al.  Learning to Rank Answers on Large Online QA Collections , 2008, ACL.

[11]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[12]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[13]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[14]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[15]  W. Bruce Croft,et al.  Quary Expansion Using Local and Global Document Analysis , 1996, SIGIR Forum.

[16]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[17]  W. Bruce Croft,et al.  Indri : A language-model based search engine for complex queries ( extended version ) , 2005 .

[18]  Tie-Yan Liu Learning to Rank for Information Retrieval , 2009, Found. Trends Inf. Retr..

[19]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[20]  W. Bruce Croft,et al.  Latent concept expansion using markov random fields , 2007, SIGIR.

[21]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[22]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[23]  Tao Tao,et al.  Regularized estimation of mixture models for robust pseudo-relevance feedback , 2006, SIGIR.

[24]  Hongfei Lin,et al.  Social annotation in query expansion: a machine learning approach , 2011, SIGIR.

[25]  W. Bruce Croft,et al.  A framework for selective query expansion , 2004, CIKM '04.

[26]  Pu-Jen Cheng,et al.  A term dependency-based approach for query terms ranking , 2009, CIKM.

[27]  Byron J. Gao,et al.  Learning to rank for hybrid recommendation , 2012, CIKM.

[28]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[29]  Matthew Lease An improved markov random field model for supporting verbose queries , 2009, SIGIR.

[30]  Quoc V. Le,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, Neural Information Processing Systems.

[31]  Stephen E. Robertson,et al.  Selecting good expansion terms for pseudo-relevance feedback , 2008, SIGIR '08.

[32]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[33]  Jianfeng Gao,et al.  Ranking, Boosting, and Model Adaptation , 2008 .

[34]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[35]  James Allan,et al.  Effective and efficient user interaction for long queries , 2008, SIGIR '08.

[36]  Rosie Jones,et al.  Query word deletion prediction , 2003, SIGIR.

[37]  Stephen E. Robertson,et al.  Okapi at TREC-4 , 1995, TREC.

[38]  James Allan,et al.  A cluster-based resampling method for pseudo-relevance feedback , 2008, SIGIR '08.

[39]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[40]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.