Selective Cluster-Based Document Retrieval

We address the long standing challenge of selective cluster-based retrieval; namely, deciding on a per-query basis whether to apply cluster-based document retrieval or standard document retrieval. To address this classification task, we propose a few sets of features based on those utilized by the cluster-based ranker, query-performance predictors, and properties of the clustering structure. Empirical evaluation shows that our method outperforms state-of-the-art retrieval approaches, including cluster-based, query expansion, and term proximity methods.

[1]  Craig MacDonald,et al.  On the usefulness of query features for learning to rank , 2012, CIKM.

[2]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[3]  Charles L. A. Clarke,et al.  Reciprocal rank fusion outperforms condorcet and individual rank learning methods , 2009, SIGIR.

[4]  Robert Villa,et al.  The effectiveness of query-specific hierarchic clustering in information retrieval , 2002, Inf. Process. Manag..

[5]  Ellen M. Vdorhees The cluster hypothesis revisited , 1985, SIGIR 1985.

[6]  Key-Sun Choi,et al.  Re-ranking model based on document clusters , 2001, Inf. Process. Manag..

[7]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[8]  Bilwaj Gaonkar,et al.  Deriving Statistical Significance Maps for Support Vector Regression Using Medical Imaging Data , 2013, 2013 International Workshop on Pattern Recognition in Neuroimaging.

[9]  Oren Kurland,et al.  Re-ranking search results using an additional retrieved list , 2011, Information Retrieval.

[10]  John D. Lafferty,et al.  Document Language Models, Query Models, and Risk Minimization for Information Retrieval , 2001, SIGIR Forum.

[11]  W. Bruce Croft,et al.  The Use of Adaptive Mechanisms for Selection of Search Strategies in Document Retrieval Systems , 1984, SIGIR.

[12]  Guodong Zhou,et al.  Document re-ranking using cluster validation and label propagation , 2006, CIKM '06.

[13]  Oren Kurland,et al.  Predicting Query Performance by Query-Drift Estimation , 2009, TOIS.

[14]  Falk Scholer,et al.  Effective Pre-retrieval Query Performance Prediction Using Similarity and Variability Evidence , 2008, ECIR.

[15]  W. Bruce Croft,et al.  Quality-biased ranking of web documents , 2011, WSDM '11.

[16]  James Allan,et al.  A cluster-based resampling method for pseudo-relevance feedback , 2008, SIGIR '08.

[17]  Peter Willett,et al.  Using interdocument similarity information in document retrieval systems , 1997 .

[18]  Iadh Ounis,et al.  Inferring Query Performance Using Pre-retrieval Predictors , 2004, SPIRE.

[19]  James Allan,et al.  Learning to select rankers , 2010, SIGIR '10.

[20]  Fernando Diaz,et al.  UMass at TREC 2004: Novelty and HARD , 2004, TREC.

[21]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[22]  W. Bruce Croft,et al.  Cluster-based retrieval using language models , 2004, SIGIR '04.

[23]  Oren Kurland,et al.  Re-ranking search results using language models of query-specific clusters , 2009, Information Retrieval.

[24]  Charles L. A. Clarke,et al.  Efficient and effective spam filtering and re-ranking for large web datasets , 2010, Information Retrieval.

[25]  Oren Kurland,et al.  Query-performance prediction and cluster ranking: two sides of the same coin , 2012, CIKM.

[26]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[27]  Oren Kurland,et al.  Query-performance prediction: setting the expectations straight , 2014, SIGIR.

[28]  W. Bruce Croft,et al.  I3R: A new approach to the design of document retrieval systems , 1987, J. Am. Soc. Inf. Sci..

[29]  W. Bruce Croft,et al.  I 3 R: a new approach to the design of document retrieval systems , 1987 .

[30]  W. Bruce Croft,et al.  Evaluating Text Representations for Retrieval of the Best Group of Documents , 2008, ECIR.

[31]  Oren Kurland,et al.  PageRank without hyperlinks: structural re-ranking using links induced by language models , 2005, SIGIR '05.

[32]  Ellen M. Voorhees,et al.  TREC 2014 Web Track Overview , 2015, TREC.

[33]  Oren Kurland,et al.  Ranking document clusters using markov random fields , 2013, SIGIR.

[34]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[35]  Peter Willett Query-specific automatic document classification , 1985 .

[36]  Ellen M. Voorhees,et al.  Overview of the TREC 2014 Web Track , 2017 .

[37]  Fernando Diaz,et al.  Regularizing ad hoc retrieval scores , 2005, CIKM '05.

[38]  Gary Marchionini,et al.  Synthesis Lectures on Information Concepts, Retrieval, and Services , 2009 .

[39]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[40]  Carmel Domshlak,et al.  A rank-aggregation approach to searching for optimal query-specific clusters , 2008, SIGIR '08.

[41]  Elad Yom-Tov,et al.  Estimating the query difficulty for information retrieval , 2010, Synthesis Lectures on Information Concepts, Retrieval, and Services.

[42]  Mike Thelwall,et al.  Synthesis Lectures on Information Concepts, Retrieval, and Services , 2009 .

[43]  W. Bruce Croft A model of cluster searching bases on classification , 1980, Inf. Syst..

[44]  Oren Kurland,et al.  Respect my authority!: HITS without hyperlinks, utilizing cluster-based language models , 2006, SIGIR.

[45]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[46]  Ingemar J. Cox,et al.  On ranking the effectiveness of searches , 2006, SIGIR.

[47]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[48]  Oren Kurland,et al.  The opposite of smoothing: a language model approach to ranking query-specific document clusters , 2008, SIGIR '08.

[49]  Ellen M. Vdorhees,et al.  The cluster hypothesis revisited , 1985, SIGIR '85.