Defining an Optimal Configuration Set for Selective Search Strategy - A Risk-Sensitive Approach

A search engine generally applies a single search strategy to any user query. The search combines many component processes (e.g., indexing, query expansion, search-weighting model, document ranking) and their hyperparameters, whose values are optimized based on past queries and then applied to all future queries. Even an optimized system may perform poorly on some queries, however, whereas another system might perform better on those queries. Selective search strategy aims to select the most appropriate combination of components and hyperparameter values to apply for each individual query. The number of candidate combinations is huge. To adapt best to any query, the ideal system would use many combinations. In the real world it would be too costly to use and maintain thousands of configurations. A trade-off must therefore be found between performance and cost. In this paper, we describe a risk-sensitive approach to optimize the set of configurations that should be included in a selective search strategy. This approach solves the problem of which and how many configurations to include in the system. We show that the use of 20 configurations results in significantly greater effectiveness than current approaches when tested on three TREC reference collections, by about 23% when compared to L2R documents and about 10% when compared to other selective approaches, and that it offers an appropriate trade-off between system complexity and system effectiveness.

[1]  Andrew Trotman,et al.  Learning to Rank , 2005, Information Retrieval.

[2]  Josiane Mothe,et al.  Nonconvex Regularizations for Feature Selection in Ranking With Sparse SVM , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Phillipp Kaestner,et al.  Linear And Nonlinear Programming , 2016 .

[4]  J. Shane Culpepper,et al.  Dynamic Shard Cutoff Prediction for Selective Search , 2018, SIGIR.

[5]  Claudio Carpineto,et al.  Query Difficulty, Robustness, and Selective Application of Query Expansion , 2004, ECIR.

[6]  Ben Carterette,et al.  Multiple testing in statistical analysis of systems-based information retrieval experiments , 2012, TOIS.

[7]  Gianni Amati,et al.  Probability models for information retrieval based on divergence from randomness , 2003 .

[8]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[9]  Stephen E. Robertson,et al.  Optimisation methods for ranking functions with multiple parameters , 2006, CIKM '06.

[10]  Craig MacDonald,et al.  About learning models with multiple query-dependent features , 2013, TOIS.

[11]  W. Bruce Croft,et al.  A framework for selective query expansion , 2004, CIKM '04.

[12]  Josiane Mothe,et al.  Learning to Rank System Configurations , 2016, CIKM.

[13]  Yang Xu,et al.  Query dependent pseudo-relevance feedback based on wikipedia , 2009, SIGIR.

[14]  Nicola Ferro,et al.  A General Linear Mixed Models Approach to Study System Component Effects , 2016, SIGIR.

[15]  Kevyn Collins-Thompson,et al.  Reducing the risk of query expansion via robust constrained optimization , 2009, CIKM.

[16]  Ahmet Arslan,et al.  A selective approach to index term weighting for robust information retrieval based on the frequency distributions of query terms , 2018, Information Retrieval Journal.

[17]  Donna K. Harman,et al.  Overview of the Reliable Information Access Workshop , 2009, Information Retrieval.

[18]  Hang Li,et al.  A Short Introduction to Learning to Rank , 2011, IEICE Trans. Inf. Syst..

[19]  Josiane Mothe,et al.  Learning to Choose the Best System Configuration in Information Retrieval: the Case of Repeated Queries , 2015, J. Univers. Comput. Sci..

[20]  Iadh Ounis,et al.  A Query-based Pre-retrieval Model Selection Approach to Information Retrieval , 2004, RIAO.

[21]  Hongfei Lin,et al.  Learning to rank using multiple loss functions , 2019, Int. J. Mach. Learn. Cybern..

[22]  J. Shane Culpepper,et al.  Query Driven Algorithm Selection in Early Stage Retrieval , 2017, WSDM.

[23]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[24]  Le Zhao,et al.  Automatic term mismatch diagnosis for selective query expansion , 2012, SIGIR '12.

[25]  Oren Kurland,et al.  Query-performance prediction: setting the expectations straight , 2014, SIGIR.

[26]  Qiang Wu,et al.  Adapting boosting for information retrieval measures , 2010, Information Retrieval.

[27]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[28]  Hongfei Lin,et al.  Assessment of learning to rank methods for query expansion , 2016, J. Assoc. Inf. Sci. Technol..

[29]  Rabia Nuray-Turan,et al.  Automatic ranking of information retrieval systems using data fusion , 2006, Inf. Process. Manag..

[30]  Josiane Mothe,et al.  Learning to Adaptively Rank Document Retrieval System Configurations , 2018, ACM Trans. Inf. Syst..

[31]  J. Shane Culpepper,et al.  Taking Risks with Confidence , 2019, ADCS.

[32]  Evangelos Kanoulas,et al.  Bayesian Optimization for Optimizing Retrieval Systems , 2018, WSDM.

[33]  Jun Wang,et al.  Portfolio theory of information retrieval , 2009, SIGIR.

[34]  Craig MacDonald,et al.  Transferring Learning To Rank Models for Web Search , 2015, ICTIR.

[35]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[36]  Howard R. Turtle,et al.  Query Evaluation: Strategies and Optimizations , 1995, Inf. Process. Manag..

[37]  J. Shane Culpepper,et al.  On the Pluses and Minuses of Risk , 2019, AIRS.

[38]  Stephen E. Robertson,et al.  Hits hits TREC: exploring IR evaluation results with network analysis , 2007, SIGIR.

[39]  Craig MacDonald,et al.  Risk-Sensitive Evaluation and Learning to Rank using Multiple Baselines , 2016, SIGIR.

[40]  Stephen E. Robertson,et al.  Selecting good expansion terms for pseudo-relevance feedback , 2008, SIGIR '08.

[41]  Thierson Couto,et al.  Incorporating Risk-Sensitiveness into Feature Selection for Learning to Rank , 2016, CIKM.

[42]  D. Frank Hsu,et al.  Comparing Rank and Score Combination Methods for Data Fusion in Information Retrieval , 2005, Information Retrieval.

[43]  Josiane Mothe,et al.  Query Performance Prediction Focused on Summarized Letor Features , 2018, SIGIR.

[44]  Josiane Mothe,et al.  Predicting the Best System Parameter Configuration: the (Per Parameter Learning) PPL method , 2017, KES.

[45]  J. Shane Culpepper,et al.  Fusion in Information Retrieval: SIGIR 2018 Half-Day Tutorial , 2018, SIGIR.

[46]  Paul N. Bennett,et al.  Robust ranking models via risk-sensitive optimization , 2012, SIGIR '12.

[47]  W. Bruce Croft,et al.  Linear feature-based models for information retrieval , 2007, Information Retrieval.

[48]  Iadh Ounis,et al.  Combining fields for query expansion and adaptive query expansion , 2007, Inf. Process. Manag..

[49]  N. Given,et al.  Predicting query performance on the web , 2010, SIGIR.

[50]  Oren Kurland,et al.  Predicting Query Performance by Query-Drift Estimation , 2009, TOIS.

[51]  J. Shane Culpepper,et al.  Risk-Reward Trade-offs in Rank Fusion , 2017, ADCS.

[52]  Craig MacDonald,et al.  Tackling Biased Baselines in the Risk-Sensitive Evaluation of Retrieval Systems , 2014, ECIR.

[53]  Craig MacDonald,et al.  Hypothesis testing for the risk-sensitive evaluation of retrieval systems , 2014, SIGIR.