Query-performance prediction and cluster ranking: two sides of the same coin

We show that two tasks which were independently addressed in the information retrieval literature actually amount to the exact same task. The first is query performance prediction; i.e., estimating the effectiveness of a search performed in response to a query in the absence of relevance judgments. The second task is cluster ranking, that is, ranking clusters of similar documents by their presumed effectiveness (i.e., relevance) with respect to the query. Furthermore, we show that several state-of-the-art methods that were independently devised for each of the two tasks are based on the same principles. Finally, we empirically demonstrate that using insights gained in work on query-performance prediction can help, in many cases, to improve the performance of a previously proposed cluster ranking method.

[1]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[2]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[3]  Anton Leuski,et al.  Evaluating document clustering for interactive information retrieval , 2001, CIKM '01.

[4]  Robert Villa,et al.  The effectiveness of query-specific hierarchic clustering in information retrieval , 2002, Inf. Process. Manag..

[5]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[6]  Jeffrey Bennett,et al.  Clairvoyance Corporation Experiments in the TREC 2003 High Accuracy Retrieval from Douments (HARD) Track , 2003, TREC.

[7]  Oren Kurland,et al.  Corpus structure, language models, and ad hoc information retrieval , 2004, SIGIR '04.

[8]  Claudio Carpineto,et al.  Query Difficulty, Robustness, and Selective Application of Query Expansion , 2004, ECIR.

[9]  Stephen Tomlinson,et al.  Robust, Web and Terabyte Retrieval with Hummingbird SearchServer at TREC 2004 , 2004, TREC.

[10]  W. Bruce Croft,et al.  Cluster-based retrieval using language models , 2004, SIGIR '04.

[11]  Elad Yom-Tov,et al.  Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval , 2005, SIGIR '05.

[12]  Elad Yom-Tov,et al.  What makes a query difficult? , 2006, SIGIR.

[13]  W. Bruce Croft,et al.  Representing clusters for retrieval , 2006, SIGIR.

[14]  Ingemar J. Cox,et al.  On ranking the effectiveness of searches , 2006, SIGIR.

[15]  Oren Kurland,et al.  Respect my authority!: HITS without hyperlinks, utilizing cluster-based language models , 2006, SIGIR.

[16]  W. Bruce Croft,et al.  Ranking robustness: a novel framework to predict query performance , 2006, CIKM '06.

[17]  W. Bruce Croft,et al.  Query performance prediction in web search environments , 2007, SIGIR.

[18]  Fernando Diaz,et al.  Performance prediction using spatial autocorrelation , 2007, SIGIR.

[19]  Javed A. Aslam,et al.  Query Hardness Estimation Using Jensen-Shannon Divergence Among Multiple Scoring Functions , 2007, ECIR.

[20]  Oren Kurland,et al.  The opposite of smoothing: a language model approach to ranking query-specific document clusters , 2008, SIGIR '08.

[21]  Carmel Domshlak,et al.  A rank-aggregation approach to searching for optimal query-specific clusters , 2008, SIGIR '08.

[22]  Ricardo Baeza-Yates,et al.  Improved query difficulty prediction for the web , 2008, CIKM '08.

[23]  W. Bruce Croft,et al.  Evaluating Text Representations for Retrieval of the Best Group of Documents , 2008, ECIR.

[24]  Djoerd Hiemstra,et al.  The Combination and Evaluation of Query Performance Prediction Methods , 2009, ECIR.

[25]  Mike Thelwall,et al.  Synthesis Lectures on Information Concepts, Retrieval, and Services , 2009 .

[26]  Oren Kurland,et al.  Predicting Query Performance by Query-Drift Estimation , 2009, ICTIR.

[27]  Gary Marchionini,et al.  Synthesis Lectures on Information Concepts, Retrieval, and Services , 2009 .

[28]  W. Bruce Croft,et al.  Geometric representations for multiple documents , 2010, SIGIR.

[29]  Elad Yom-Tov,et al.  Estimating the query difficulty for information retrieval , 2010, Synthesis Lectures on Information Concepts, Retrieval, and Services.

[30]  Lourdes Araujo,et al.  Standard Deviation as a Query Hardness Estimator , 2010, SPIRE.

[31]  Oren Kurland,et al.  Using statistical decision theory and relevance models for query-performance prediction , 2010, SIGIR.

[32]  Benno Stein,et al.  The optimum clustering framework: implementing the cluster hypothesis , 2011, Information Retrieval.

[33]  Joemon M. Jose,et al.  Improved query performance prediction using standard deviation , 2011, SIGIR.

[34]  Ronan Cummins,et al.  Predicting Query Performance Directly from Score Distributions , 2011, AIRS.

[35]  Oren Kurland,et al.  Predicting Query Performance by Query-Drift Estimation , 2009, TOIS.