Query polyrepresentation for ranking retrieval systems without relevance judgments

Ranking information retrieval (IR) systems with respect to their effectiveness is a crucial operation during IR evaluation, as well as during data fusion. This article offers a novel method of approaching the system-ranking problem, based on the widely studied idea of polyrepresentation. The principle of polyrepresentation suggests that a single information need can be represented by many query articulations–what we call query aspects. By skimming the top k (where k is small) documents retrieved by a single system for multiple query aspects, we collect a set of documents that are likely to be relevant to a given test topic. Labeling these skimmed documents as putatively relevant lets us build pseudorelevance judgments without undue human intervention. We report experiments where using these pseudorelevance judgments delivers a rank ordering of IR systems that correlates highly with rankings based on human relevance judgments. © 2010 Wiley Periodicals, Inc.

[1]  Mark Sanderson,et al.  Forming test collections with no system pooling , 2004, SIGIR '04.

[2]  Xin Fu,et al.  The loquacious user: a document-independent source of terms for query expansion , 2005, SIGIR '05.

[3]  Justin Zobel,et al.  How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.

[4]  Miles Efron,et al.  Using Multiple Query Aspects to Build Test Collections without Human Relevance Judgments , 2009, ECIR.

[5]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[6]  Shengli Wu,et al.  Methods for ranking information retrieval systems without relevance judgments , 2003, SAC '03.

[7]  Anselm Spoerri,et al.  Using the structure of overlap between search results to rank retrieval systems without relevance judgments , 2007, Inf. Process. Manag..

[8]  Ellen M. Voorhees Variations in relevance judgments and the measurement of retrieval effectiveness , 2000, Inf. Process. Manag..

[9]  M. Kendall Rank Correlation Methods , 1949 .

[10]  James Allan,et al.  Evaluation over thousands of queries , 2008, SIGIR '08.

[11]  Ellen M. Voorhees,et al.  Overview of the Seventh Text REtrieval Conference , 1998 .

[12]  James Allan,et al.  Minimal test collections for retrieval evaluation , 2006, SIGIR.

[13]  Peter Ingwersen,et al.  The Turn - Integration of Information Seeking and Retrieval in Context , 2005, The Kluwer International Series on Information Retrieval.

[14]  Peter Ingwersen,et al.  Cognitive Perspectives of Information Retrieval Interaction: Elements of a Cognitive IR Theory , 1996, J. Documentation.

[15]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[16]  E. A. Fox,et al.  Combining the Evidence of Multiple Query Representations for Information Retrieval , 1995, Inf. Process. Manag..

[17]  Tetsuya Sakai,et al.  Alternatives to Bpref , 2007, SIGIR.

[18]  D. K. Harmon,et al.  Overview of the Third Text Retrieval Conference (TREC-3) , 1996 .

[19]  Donna K. Harman,et al.  Overview of the Eighth Text REtrieval Conference (TREC-8) , 1999, TREC.

[20]  Mette Skov,et al.  Inter and intra-document contexts applied in polyrepresentation for best match IR , 2008, Inf. Process. Manag..

[21]  John D. Lafferty,et al.  A risk minimization framework for information retrieval , 2006, Inf. Process. Manag..

[22]  Birger Larsen,et al.  Data fusion according to the principle of polyrepresentation , 2009 .

[23]  Ellen M. Voorhees,et al.  Retrieval evaluation with incomplete information , 2004, SIGIR '04.

[24]  Ellen M. Voorhees,et al.  Evaluation by highly relevant documents , 2001, SIGIR '01.

[25]  David C. Blair STAIRS redux: thoughts on the STAIRS evaluation, ten years after , 1996 .

[26]  Jaana Kekäläinen,et al.  The polyrepresentation continuum in IR , 2006, IIiX.

[27]  C. W. Cleverdon,et al.  The testing of index language devices , 1997 .

[28]  James Allan,et al.  Incremental test collections , 2005, CIKM '05.

[29]  Ellen M. Voorhees,et al.  Evaluating Evaluation Measure Stability , 2000, SIGIR 2000.

[30]  Ian Soboroff,et al.  Ranking retrieval systems without relevance judgments , 2001, SIGIR '01.

[31]  Xin Fu,et al.  Eliciting better information need descriptions from users of information search systems , 2007, Inf. Process. Manag..

[32]  Stephen P. Harter Variations in relevance assessments and the measurement of retrieval effectiveness , 1996 .

[33]  Nicholas J. Belkin,et al.  The effect multiple query representations on information retrieval system performance , 1993, SIGIR.

[34]  Mark Baillie,et al.  Evaluating epistemic uncertainty under incomplete assessments , 2008, Inf. Process. Manag..