Testing the Cluster Hypothesis with Focused and Graded Relevance Judgments

The cluster hypothesis is a fundamental concept in ad hoc retrieval. Heretofore, cluster hypothesis tests were applied to documents using binary relevance judgments. We present novel tests that utilize graded and focused relevance judgments; the latter are markups of relevant text in relevant documents. Empirical exploration reveals that the cluster hypothesis holds not only for documents, but also for passages, as measured by the proposed tests. Furthermore, the hypothesis holds to a higher extent for highly relevant documents and for those that contain a high fraction of relevant text.

[1]  Oren Kurland,et al.  Exploring the cluster hypothesis, and cluster-based retrieval, over the web , 2012, CIKM '12.

[2]  Leif Azzopardi,et al.  Extending the language modeling framework for sentence retrieval to include local context , 2011, Information Retrieval.

[3]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[4]  Andrew Trotman,et al.  Overview of the INEX 2010 Ad Hoc Track , 2010, INEX.

[5]  Peter Willett,et al.  Techniques for the measurement of clustering tendency in document retrieval systems , 1987, J. Inf. Sci..

[6]  Ellen M. Vdorhees,et al.  The cluster hypothesis revisited , 1985, SIGIR '85.

[7]  Jong-Hyeok Lee,et al.  Revisit of Nearest Neighbor Test for Direct Evaluation of Inter-document Similarities , 2008, ECIR.

[8]  Milad Shokouhi,et al.  Learning Asymmetric Co-Relevance , 2015, ICTIR.

[9]  W. Bruce Croft,et al.  A Translation Model for Sentence Retrieval , 2005, HLT.

[10]  Oren Kurland,et al.  Position-based contextualization for passage retrieval , 2013, CIKM.

[11]  Oren Kurland,et al.  The cluster hypothesis for entity oriented search , 2013, SIGIR.

[12]  Ellen M. Vdorhees The cluster hypothesis revisited , 1985, SIGIR 1985.

[13]  Sylvain Lamprier,et al.  Using Text Segmentation to Enhance the Cluster Hypothesis , 2008, AIMSA.

[14]  W. Bruce Croft,et al.  Retrieving Passages and Finding Answers , 2014, ADCS '14.

[15]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[16]  Oren Kurland,et al.  Ranking document clusters using markov random fields , 2013, SIGIR.

[17]  Benno Stein,et al.  The optimum clustering framework: implementing the cluster hypothesis , 2011, Information Retrieval.

[18]  Oren Kurland,et al.  The correlation between cluster hypothesis tests and the effectiveness of cluster-based retrieval , 2014, SIGIR.

[19]  Oren Kurland,et al.  Re-ranking search results using language models of query-specific clusters , 2009, Information Retrieval.

[20]  Charles L. A. Clarke,et al.  Efficient and effective spam filtering and re-ranking for large web datasets , 2010, Information Retrieval.

[21]  James Allan,et al.  A New Measure of the Cluster Hypothesis , 2009, ICTIR.

[22]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[23]  Robert Villa,et al.  The effectiveness of query-specific hierarchic clustering in information retrieval , 2002, Inf. Process. Manag..

[24]  Andrew Trotman,et al.  Overview of the INEX 2009 Ad Hoc Track , 2009, INEX.