Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation

We consider Information Retrieval evaluation, especially at TREC with the trec_eval program. It appears that systems obtain scores regarding not only the relevance of retrieved documents, but also according to document names in case of ties (i.e., when they are retrieved with the same score). We consider this tie-breaking strategy as an uncontrolled parameter influencing measure scores, and argue the case for fairer tie-breaking strategies. A study of 22 TREC editions reveals significant differences between the Conventional unfair TREC's strategy and the fairer strategies we propose. This experimental result advocates using these fairer strategies when conducting evaluations.

[1]  Marc Najork,et al.  Computing Information Retrieval Performance Measures Efficiently in the Presence of Tied Scores , 2008, ECIR.

[2]  Stuart Macdonald,et al.  User Engagement in Research Data Curation , 2009, ECDL.

[3]  Stephen E. Robertson,et al.  On the history of evaluation in IR , 2008, J. Inf. Sci..

[4]  Carol Peters,et al.  Comparative Evaluation of Multilingual Information Access Systems , 2003, Lecture Notes in Computer Science.

[5]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[6]  Giorgio Maria Di Nunzio,et al.  DIRECT: A System for Evaluating Information Access Components of Digital Libraries , 2005, ECDL.

[7]  Carol Peters,et al.  European research letter: Cross-language system evaluation: The CLEF campaigns , 2001, J. Assoc. Inf. Sci. Technol..

[8]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Robust Track. , 2004 .

[9]  Justin Zobel,et al.  How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.

[10]  Mark Sanderson,et al.  Information retrieval system evaluation: effort, sensitivity, and reliability , 2005, SIGIR '05.

[11]  Ellen M. Voorhees,et al.  TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing) , 2005 .

[12]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[13]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[14]  Tie-Yan Liu,et al.  Learning to rank for information retrieval (LR4IR 2007) , 2007, SIGF.

[15]  Donna Harman,et al.  The First Text REtrieval Conference (TREC-1) , 1993 .

[16]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[17]  Ellen M. Voorhees,et al.  Variations in relevance judgments and the measurement of retrieval effectiveness , 1998, SIGIR '98.

[18]  David A. Hull Using statistical testing in the evaluation of retrieval experiments , 1993, SIGIR.

[19]  Ellen M. Voorhees,et al.  Retrieval System Evaluation , 2005 .

[20]  Thomas Martin Deserno,et al.  The CLEF 2005 Cross-Language Image Retrieval Track , 2003, CLEF.

[21]  Vijay V. Raghavan,et al.  A critical investigation of recall and precision as measures of retrieval system performance , 1989, TOIS.