Predicting the Best System Parameter Configuration: the (Per Parameter Learning) PPL method

Search engines aim at delivering the most relevant information whatever the query is. To proceed, search engines employ various modules (indexing, matching, ranking), each of these modules having different variants (e.g. different stemmers, different retrieval models or weighting functions). The international evaluation campaigns in information retrieval such as TREC revealed system variability which makes it impossible to find a single system that would be the best for any of the queries.While some approaches aim at optimizing the system parameters to improve the system effectiveness in average over a set of queries, in this paper we consider a different approach that aims at optimizing the system configuration on a per-query basis. Our method learns the configuration models in a training phase and then explores the system feature space and decides what should be the system configuration for any new query.The experimental results draw significant conclusions: (i) Predicting the best value for each system feature separately is more effective than predicting the best predefined system configuration; (ii) the method predicts successfully the optimal or most optimal system configurations for unseen queries; (iii) the mean average precision (MAP) of the system configurations predicted by our approach is much higher than the MAP of the best unique system.

[1]  Josiane Mothe,et al.  Fusing different information retrieval systems according to query-topics: a study based on correlation in information retrieval systems and TREC topics , 2011, Information Retrieval.

[2]  Josiane Mothe,et al.  Learning to Rank System Configurations , 2016, CIKM.

[3]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[4]  Josiane Mothe,et al.  Linguistic Analysis of Users' Queries: Towards an Adaptive Information Retrieval System , 2007, 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System.

[5]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[6]  Oren Kurland,et al.  Predicting Query Performance by Query-Drift Estimation , 2009, TOIS.

[7]  Donna K. Harman,et al.  The NRRC reliable information access (RIA) workshop , 2004, SIGIR '04.

[8]  Iadh Ounis,et al.  University of Glasgow at TREC 2004: Experiments in Web, Robust, and Terabyte Tracks with Terrier , 2004, TREC.

[9]  W. Bruce Croft,et al.  Query performance prediction in web search environments , 2007, SIGIR.

[10]  Josiane Mothe,et al.  Linguistic features to predict query difficulty , 2005, SIGIR 2005.

[11]  CarmelDavid,et al.  Predicting Query Performance by Query-Drift Estimation , 2012 .

[12]  C. V. Jawahar,et al.  Efficient Optimization for Average Precision SVM , 2014, NIPS.

[13]  Josiane Mothe,et al.  Learning to Choose the Best System Configuration in Information Retrieval: the Case of Repeated Queries , 2015, J. Univers. Comput. Sci..

[14]  Massih-Reza Amini,et al.  Transferring knowledge with source selection to learn IR functions on unlabeled collections , 2013, CIKM.

[15]  Elad Yom-Tov,et al.  Estimating the query difficulty for information retrieval , 2010, Synthesis Lectures on Information Concepts, Retrieval, and Services.

[16]  Claudio Carpineto,et al.  Query Difficulty, Robustness, and Selective Application of Query Expansion , 2004, ECIR.

[17]  Shengli Wu,et al.  Performance prediction of data fusion for information retrieval , 2006, Inf. Process. Manag..