Query Performance Prediction Focused on Summarized Letor Features

Query performance prediction (QPP) aims at automatically estimating the information retrieval system effectiveness for any user's query. Previous work has investigated several types of pre- and post-retrieval query performance predictors; the latter has been shown to be more effective. In this paper we investigate the use of features that were initially defined for learning to rank in the task of QPP. While these features have been shown to be useful for learning to rank documents, they have never been studied as query performance predictors. We developed more than 350 variants of them based on summary functions. Conducting experiments on four TREC standard collections, we found that Letor-based features appear to be better QPP than predictors from the literature. Moreover, we show that combining the best Letor features outperforms the state of the art query performance predictors. This is the first study that considers such an amount and variety of Letor features for QPP and that demonstrates they are appropriate for this task.

[1]  Elad Yom-Tov,et al.  Estimating the query difficulty for information retrieval , 2010, Synthesis Lectures on Information Concepts, Retrieval, and Services.

[2]  Milad Shokouhi,et al.  LambdaMerge: merging the results of query reformulations , 2011, WSDM '11.

[3]  Oren Kurland,et al.  Predicting Query Performance by Query-Drift Estimation , 2009, TOIS.

[4]  Fernando Diaz,et al.  Using Query Performance Predictors to Reduce Spoken Queries , 2017, ECIR.

[5]  W. Bruce Croft,et al.  Quantifying query ambiguity , 2002 .

[6]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[7]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[8]  Claudio Carpineto,et al.  Query Difficulty, Robustness, and Selective Application of Query Expansion , 2004, ECIR.

[9]  James Allan,et al.  Learning to select rankers , 2010, SIGIR '10.

[10]  Shariq Bashir Combining pre-retrieval query quality predictors using genetic programming , 2013, Applied Intelligence.

[11]  Craig MacDonald,et al.  About learning models with multiple query-dependent features , 2013, TOIS.

[12]  W. Bruce Croft,et al.  Query performance prediction in web search environments , 2007, SIGIR.

[13]  Josiane Mothe,et al.  Linguistic features to predict query difficulty , 2005, SIGIR 2005.

[14]  Claudia Hauff,et al.  Predicting the effectiveness of queries and retrieval systems , 2010, SIGF.

[15]  Iadh Ounis,et al.  Multinomial Randomness Models for Retrieval with Document Fields , 2007, ECIR.

[16]  Bahar Karaoglan,et al.  A nonparametric term weighting method for information retrieval based on measuring the divergence from independence , 2014, Information Retrieval.

[17]  Oren Kurland,et al.  Query-performance prediction: setting the expectations straight , 2014, SIGIR.

[18]  Josiane Mothe,et al.  Learning to Rank System Configurations , 2016, CIKM.

[19]  Fernando Diaz,et al.  Using Query Performance Predictors to Improve Spoken Queries , 2016, ECIR.

[20]  Gianni Amati,et al.  Probability models for information retrieval based on divergence from randomness , 2003 .

[21]  Djoerd Hiemstra,et al.  A survey of pre-retrieval query performance predictors , 2008, CIKM '08.

[22]  Oren Kurland,et al.  Using statistical decision theory and relevance models for query-performance prediction , 2010, SIGIR.