Reducing the risk of query expansion via robust constrained optimization

We introduce a new theoretical derivation, evaluation methods, and extensive empirical analysis for an automatic query expansion framework in which model estimation is cast as a robust constrained optimization problem. This framework provides a powerful method for modeling and solving complex expansion problems, by allowing multiple sources of domain knowledge or evidence to be encoded as simultaneous optimization constraints. Our robust optimization approach provides a clean theoretical way to model not only expansion benefit, but also expansion risk, by optimizing over uncertainty sets for the data. In addition, we introduce risk-reward curves to visualize expansion algorithm performance and analyze parameter sensitivity. We show that a robust approach significantly reduces the number and magnitude of expansion failures for a strong baseline algorithm, with no loss in average gain. Our approach is implemented as a highly efficient post-processing step that assumes little about the baseline expansion method used as input, making it easy to apply to existing expansion methods. We provide analysis showing that this approach is a natural and effective way to do selective expansion, automatically reducing or avoiding expansion in risky scenarios, and successfully attenuating noise in poor baseline methods.

[1]  Kevyn Collins-Thompson Robust Word Similarity Estimation Using Perturbation Kernels , 2009, ICTIR.

[2]  Anthony E. Cawkell Annual Review of Information Science and Technology (Vol 36) , 2002, J. Documentation.

[3]  Carmel Domshlak,et al.  Towards robust query expansion: model selection in the language modeling framework , 2007, SIGIR.

[4]  Susan T. Dumais,et al.  Personalizing Search via Automated Analysis of Interests and Activities , 2005, SIGIR.

[5]  Stephen P. Boyd,et al.  Relaxed maximum a posteriori fault identification , 2009, Signal Process..

[6]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[7]  Kevyn Collins-Thompson,et al.  Query expansion using random walk models , 2005, CIKM '05.

[8]  Kevyn Collins-Thompson,et al.  Robust model estimation methods for information retrieval , 2008 .

[9]  ChengXiang Zhai,et al.  A general optimization framework for smoothing language models on graph structures , 2008, SIGIR '08.

[10]  Marcel Worring,et al.  NIST Special Publication , 2005 .

[11]  Kevyn Collins-Thompson Estimating Robust Query Models with Convex Optimization , 2008, NIPS.

[12]  Donna K. Harman,et al.  The NRRC reliable information access (RIA) workshop , 2004, SIGIR '04.

[13]  Kevyn Collins-Thompson,et al.  Estimation and use of uncertainty in pseudo-relevance feedback , 2007, SIGIR.

[14]  Alan F. Smeaton,et al.  The Retrieval Effects of Query Expansion on a Feedback Document Retrieval System , 1983, Comput. J..

[15]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[16]  Tao Tao,et al.  Regularized estimation of mixture models for robust pseudo-relevance feedback , 2006, SIGIR.

[17]  Arkadi Nemirovski,et al.  Robust solutions of uncertain linear programs , 1999, Oper. Res. Lett..

[18]  Claudio Carpineto,et al.  Query Difficulty, Robustness, and Selective Application of Query Expansion , 2004, ECIR.

[19]  Michael I. Jordan,et al.  A Robust Minimax Approach to Classification , 2003, J. Mach. Learn. Res..

[20]  Kevyn Collins-Thompson,et al.  Initial Results with Structured Queries and Language Models on Half a Terabyte of Text , 2004, TREC.

[21]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[22]  Gerard Salton,et al.  The SMART Retrieval System , 1971 .

[23]  Jun Wang,et al.  Mean-Variance Analysis: A New Document Ranking Theory in Information Retrieval , 2009, ECIR.

[24]  Chris Buckley Why current IR engines fail , 2004, SIGIR '04.

[25]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[26]  Robert Wing Pong Luk,et al.  A Generative Theory of Relevance , 2008, The Information Retrieval Series.

[27]  W. Bruce Croft,et al.  Latent concept expansion using markov random fields , 2007, SIGIR.

[28]  W. Bruce Croft,et al.  Indri : A language-model based search engine for complex queries ( extended version ) , 2005 .

[29]  Stephen E. Robertson,et al.  Selecting good expansion terms for pseudo-relevance feedback , 2008, SIGIR '08.