Cluster-based fusion of retrieved lists

Methods for fusing document lists that were retrieved in response to a query often use retrieval scores (or ranks) of documents in the lists. We present a novel probabilistic fusion approach that utilizes an additional source of rich information, namely, inter-document similarities. Specifically, our model integrates information induced from clusters of similar documents created across the lists with that produced by some fusion method that relies on retrieval scores (ranks). Empirical evaluation shows that our approach is highly effective for fusion. For example, the performance of our model is consistently better than that of the standard (effective) fusion method that it integrates. The performance also transcends that of standard fusion of re-ranked lists, where list re-ranking is based on clusters created from documents in the list.

[1]  Ian Soboroff,et al.  Ranking retrieval systems without relevance judgments , 2001, SIGIR '01.

[2]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[3]  W. Bruce Croft,et al.  Geometric representations for multiple documents , 2010, SIGIR.

[4]  W. Bruce Croft,et al.  Evaluating Text Representations for Retrieval of the Best Group of Documents , 2008, ECIR.

[5]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[6]  Tao Tao,et al.  Language Model Information Retrieval with Document Expansion , 2006, NAACL.

[7]  James Allan,et al.  A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[8]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Robust Retrieval Track , 2004 .

[9]  W. Bruce Croft Advances in Informational Retrieval: Recent Research from the Center for Intelligent Information Retrieval , 2000 .

[10]  W. Bruce Croft,et al.  Cluster-based retrieval using language models , 2004, SIGIR '04.

[11]  W. Bruce Croft Combining Approaches to Information Retrieval , 2002 .

[12]  Oren Kurland,et al.  From "Identical" to "Similar": Fusing Retrieved Lists Based on Inter-document Similarities , 2011, ICTIR.

[13]  Mounia Lalmas,et al.  Merging techniques for performing data fusion on the web , 2001, CIKM '01.

[14]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[15]  Robert Villa,et al.  The effectiveness of query-specific hierarchic clustering in information retrieval , 2002, Inf. Process. Manag..

[16]  Milad Shokouhi,et al.  Segmentation of Search Engine Results for Effective Data-Fusion , 2007, ECIR.

[17]  Oren Kurland,et al.  The opposite of smoothing: a language model approach to ranking query-specific document clusters , 2008, SIGIR '08.

[18]  Mark Sanderson,et al.  Experiments on data fusion using headline information , 2002, SIGIR '02.

[19]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[20]  Peter Willett,et al.  Using interdocument similarity information in document retrieval systems , 1997 .

[21]  Ophir Frieder,et al.  Surrogate scoring for improved metasearch precision , 2005, SIGIR '05.

[22]  Milad Shokouhi,et al.  LambdaMerge: merging the results of query reformulations , 2011, WSDM '11.

[23]  Jeffrey Bennett,et al.  Clairvoyance Corporation Experiments in the TREC 2003 High Accuracy Retrieval from Douments (HARD) Track , 2003, TREC.

[24]  Oren Kurland,et al.  Corpus structure, language models, and ad hoc information retrieval , 2004, SIGIR '04.

[25]  David Hawking,et al.  Merging Results From Isolated Search Engines , 1999, Australasian Database Conference.

[26]  Oren Kurland,et al.  From "Identical" to "Similar": Fusing Retrieved Lists Based on Inter-document Similarities , 2009, ICTIR.

[27]  Shengli Wu,et al.  Testing the cluster hypothesis in distributed information retrieval , 2006, Inf. Process. Manag..

[28]  Ophir Frieder,et al.  Analyses of multiple-evidence combinations for retrieval strategies , 2001, SIGIR '01.

[29]  Ellen M. Voorhees,et al.  The Collection Fusion Problem , 1994, TREC.

[30]  Oren Kurland,et al.  Respect my authority!: HITS without hyperlinks, utilizing cluster-based language models , 2006, SIGIR.

[31]  John Dunnion,et al.  ProbFuse: a probabilistic approach to data fusion , 2006, SIGIR.

[32]  Peter Willett Query-specific automatic document classification , 1985 .

[33]  Fernando Diaz,et al.  Regularizing ad hoc retrieval scores , 2005, CIKM '05.

[34]  Javed A. Aslam,et al.  Condorcet fusion for improved retrieval , 2002, CIKM '02.

[35]  Garrison W. Cottrell,et al.  Fusion Via a Linear Combination of Scores , 1999, Information Retrieval.

[36]  Santthosh Babu Selvadurai Implementing a Metasearch Framework with Content-directed Result Merging , 2007 .

[37]  H. P. Young,et al.  An axiomatization of Borda's rule , 1974 .

[38]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[39]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[40]  Oren Kurland,et al.  Re-ranking search results using an additional retrieved list , 2011, Information Retrieval.

[41]  Joon Ho Lee,et al.  Combining multiple evidence from different properties of weighting schemes , 1995, SIGIR '95.

[42]  Jeffrey Katzer,et al.  A study of the overlap among document representations , 1983, SIGIR '83.

[43]  Guodong Zhou,et al.  Document re-ranking using cluster validation and label propagation , 2006, CIKM '06.

[44]  Milad Shokouhi,et al.  Effective query expansion for federated search , 2009, SIGIR.

[45]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[46]  Ophir Frieder,et al.  Disproving the fusion hypothesis: an analysis of data fusion via effective information retrieval strategies , 2003, SAC '03.

[47]  Oren Kurland,et al.  Re-ranking search results using language models of query-specific clusters , 2009, Information Retrieval.