A new statistical strategy for pooling: ELI

Doing exhaustive relevance judgments is one of the most challenging tasks in the construction process of an IR test collection, especially when the collection is composed of millions of documents. Pooling (or system pooling), which is basically a method for selecting documents to assess, is a solution to overcome this challenge. In this paper, to form such an assessment pool, a new, ranked-based document selection criterion, called the expected level of importance (ELI), is introduced. The results of the experiments performed, using TREC 5, 6, 7, and 8 data, showed that by using a pool in which the documents are sorted in the decreasing order of their calculated ELI scores, relevance judgments can efficiently be made by minimal human effort, while maintaining the size and the effectiveness of the resulting test collection. The criterion we propose can directly be adapted to the traditional TREC pooling practice in favor of efficiency, with no additional cost.

[1]  James Allan,et al.  Minimal test collections for retrieval evaluation , 2006, SIGIR.

[2]  James Allan,et al.  Evaluation over thousands of queries , 2008, SIGIR '08.

[3]  Emine Yilmaz,et al.  A simple and efficient sampling method for estimating AP and NDCG , 2008, SIGIR '08.

[4]  Justin Zobel,et al.  How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.

[5]  Pu Li,et al.  Test theory for assessing IR test collections , 2007, SIGIR.

[6]  Tetsuya Sakai,et al.  Alternatives to Bpref , 2007, SIGIR.

[7]  Ellen M. Voorhees,et al.  Retrieval evaluation with incomplete information , 2004, SIGIR '04.

[8]  Ian Soboroff,et al.  Ranking retrieval systems without relevance judgments , 2001, SIGIR '01.

[9]  Emine Yilmaz,et al.  A statistical method for system evaluation using incomplete judgments , 2006, SIGIR.

[10]  Javed A. Aslam,et al.  A unified model for metasearch, pooling, and system evaluation , 2003, CIKM '03.

[11]  Ben Carterette,et al.  Million Query Track 2007 Overview , 2008, TREC.

[12]  Charles L. A. Clarke,et al.  Efficient construction of large test collections , 1998, SIGIR '98.

[13]  Mark Sanderson,et al.  Forming test collections with no system pooling , 2004, SIGIR '04.

[14]  Cyril W. Cleverdon,et al.  The significance of the Cranfield tests on index languages , 1991, SIGIR '91.

[15]  Alistair Moffat,et al.  Strategic system comparisons via targeted relevance judgments , 2007, SIGIR.

[16]  Ingemar J. Cox,et al.  Prioritizing relevance judgments to improve the construction of IR test collections , 2011, CIKM '11.

[17]  C. J. van Rijsbergen,et al.  Report on the need for and provision of an 'ideal' information retrieval test collection , 1975 .

[18]  Ellen M. Voorhees,et al.  Variations in relevance judgments and the measurement of retrieval effectiveness , 1998, SIGIR '98.