A few good topics: Experiments in topic set reduction for retrieval evaluation

We consider the issue of evaluating information retrieval systems on the basis of a limited number of topics. In contrast to statistically-based work on sample sizes, we hypothesize that some topics or topic sets are better than others at predicting true system effectiveness, and that with the right choice of topics, accurate predictions can be obtained from small topics sets. Using a variety of effectiveness metrics and measures of goodness of prediction, a study of a set of TREC and NTCIR results confirms this hypothesis, and provides evidence that the value of a topic set for this purpose does generalize.

[1]  Ellen M. Voorhees,et al.  The effect of topic set size on retrieval experiment error , 2002, SIGIR '02.

[2]  Alistair Moffat,et al.  Precision-at-ten considered redundant , 2008, SIGIR '08.

[3]  Emine Yilmaz,et al.  Estimating average precision with incomplete and imperfect judgments , 2006, CIKM '06.

[4]  James Allan,et al.  Minimal test collections for retrieval evaluation , 2006, SIGIR.

[5]  Ellen M. Voorhees,et al.  Evaluating evaluation measure stability , 2000, SIGIR '00.

[6]  Stephen E. Robertson,et al.  On GMAP: and other transformations , 2006, CIKM '06.

[7]  Mark Sanderson,et al.  Information retrieval system evaluation: effort, sensitivity, and reliability , 2005, SIGIR '05.

[8]  Alistair Moffat,et al.  Statistical power in retrieval experimentation , 2008, CIKM '08.

[9]  Justin Zobel,et al.  How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.

[10]  K. Sparck Jones,et al.  INFORMATION RETRIEVAL TEST COLLECTIONS , 1976 .

[11]  Stephen E. Robertson,et al.  Hits hits TREC: exploring IR evaluation results with network analysis , 2007, SIGIR.

[12]  Sebastiano Vigna,et al.  Paradoxical Effects in PageRank Incremental Computations , 2005, Internet Math..

[13]  Paul Over,et al.  Blind Men and Elephants: Six Approaches to TREC data , 1999, Information Retrieval.

[14]  Mounia Lalmas,et al.  SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval , 2006 .

[15]  Susan T. Dumais,et al.  Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval , 2004, SIGIR 2004.

[16]  Tetsuya Sakai,et al.  Evaluating evaluation metrics based on the bootstrap , 2006, SIGIR.

[17]  Tetsuya Sakai,et al.  On the reliability of information retrieval metrics based on graded relevance , 2007, Inf. Process. Manag..