Analyse des critères d'évaluation des systèmes de recherche d'information

Evaluating information retrieval implies a document collection on which search is carried out, a set of test queries and the lists of the relevant documents for each query. This evaluation framework also includes evaluation measures making it possible to control the impact of search parameters on the performance. Trec-eval calculates a large number of measures, some being used more widely, like the mean average precision or recall-precision curves. The aim of this paper is to choose the minimal set of measures necessary to compare different information retrieval systems. In this paper, we present the study we carried out on relationships between 27 measures among the most used in the literature. We show that a set of 7 measures is enough to represent 27 studied measures: ircl_prn.80, MAP, ircl_pr.20, recip_rank, P15, exact_precision, and exact_recall.

[1]  Tetsuya Sakai,et al.  On the reliability of information retrieval metrics based on graded relevance , 2007, Inf. Process. Manag..

[2]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[3]  Ludovic Lebart,et al.  Statistique exploratoire multidimensionnelle : visualisations et inférences en fouille de données , 2006 .

[4]  Ellen M. Voorhees,et al.  The Philosophy of Information Retrieval Evaluation , 2001, CLEF.

[5]  William R. Hersh,et al.  Towards new measures of information retrieval evaluation , 1995, SIGIR '95.

[6]  Leo Egghe,et al.  The measures precision, recall, fallout and miss as a function of the number of retrieved documents and their mutual interrelations , 2008, Inf. Process. Manag..

[7]  Pia Borlund,et al.  The IIR evaluation model: a framework for evaluation of interactive information retrieval systems , 2003, Inf. Res..

[8]  Ellen M. Voorhees,et al.  Overview of the seventh text retrieval conference (trec-7) [on-line] , 1999 .

[9]  Ellen M. Voorhees,et al.  Overview of the TREC 2006 , 2007, TREC.

[10]  Stephen Robertson,et al.  The methodology of information retrieval experiment , 1981 .

[11]  Cyril W. Cleverdon,et al.  Factors determining the performance of indexing systems , 1966 .

[12]  Tsunenori Ishioka,et al.  Evaluation of criteria for information retrieval , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[13]  Gobinda G. Chowdhury,et al.  TREC: Experiment and Evaluation in Information Retrieval , 2007 .

[14]  Noriko Kando,et al.  On information retrieval metrics designed for evaluation with incomplete relevance assessments , 2008, Information Retrieval.

[15]  Massimo Melucci,et al.  On rank correlation in information retrieval evaluation , 2007, SIGF.

[16]  Ellen M. Voorhees,et al.  Retrieval System Evaluation , 2005 .

[17]  David A. Hull Using statistical testing in the evaluation of retrieval experiments , 1993, SIGIR.

[18]  Emine Yilmaz,et al.  A geometric interpretation of r-precision and its correlation with average precision , 2005, SIGIR '05.

[19]  James Blustein,et al.  A Statistical Analysis of the TREC-3 Data , 1995, TREC.

[20]  Gilles Caraux,et al.  PermutMatrix: a graphical environment to arrange gene expression profiles in optimal linear order , 2005, Bioinform..

[21]  Ellen M. Voorhees,et al.  Overview of the Seventh Text REtrieval Conference , 1998 .

[22]  Ellen M. Voorhees,et al.  Retrieval evaluation with incomplete information , 2004, SIGIR '04.