PRES: a score metric for evaluating recall-oriented information retrieval applications

Information retrieval (IR) evaluation scores are generally designed to measure the effectiveness with which relevant documents are identified and retrieved. Many scores have been proposed for this purpose over the years. These have primarily focused on aspects of precision and recall, and while these are often discussed with equal importance, in practice most attention has been given to precision focused metrics. Even for recall-oriented IR tasks of growing importance, such as patent retrieval, these precision based scores remain the primary evaluation measures. Our study examines different evaluation measures for a recall-oriented patent retrieval task and demonstrates the limitations of the current scores in comparing different IR systems for this task. We introduce PRES, a novel evaluation metric for this type of application taking account of recall and the user's search effort. The behaviour of PRES is demonstrated on 48 runs from the CLEF-IP 2009 patent retrieval track. A full analysis of the performance of PRES shows its suitability for measuring the retrieval effectiveness of systems from a recall focused perspective taking into account the user's expected search effort.

[1]  Ellen M. Voorhees,et al.  The TREC robust retrieval track , 2005, SIGF.

[2]  Cyril Cleverdon,et al.  The Cranfield tests on index language devices , 1997 .

[3]  W. Bruce Croft,et al.  Automatic query generation for patent search , 2009, CIKM.

[4]  Gabriella Kazai,et al.  Structural relevance: a common basis for the evaluation of structured document retrieval , 2008, CIKM '08.

[5]  Alistair Moffat,et al.  Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.

[6]  Noriko Kando,et al.  Overview of Patent Retrieval Task at NTCIR-5 , 2005, NTCIR.

[7]  David Maxwell Chickering,et al.  Here or there: preference judgments for relevance , 2008 .

[8]  David A. Hull Using statistical testing in the evaluation of retrieval experiments , 1993, SIGIR.

[9]  Gabriella Kazai,et al.  INEX 2007 Evaluation Measures , 2008, INEX.

[10]  Douglas W. Oard,et al.  Overview of the TREC 2008 Legal Track , 2008, TREC.

[11]  John Tait,et al.  A proposal for chemical information retrieval evaluation , 2008, PaIR '08.

[12]  Ellen M. Voorhees,et al.  Evaluating Evaluation Measure Stability , 2000, SIGIR 2000.

[13]  Leif Azzopardi,et al.  Retrievability: an evaluation measure for higher order information access tasks , 2008, CIKM '08.

[14]  Jean Tague-Sutcliffe,et al.  Problems in the simulation of bibliographic retrieval systems , 1980, SIGIR '80.

[15]  Leif Azzopardi,et al.  A Methodology for Building a Patent Test Collection for Prior Art Search , 2008, EVIA@NTCIR.

[16]  Emine Yilmaz,et al.  Estimating average precision with incomplete and imperfect judgments , 2006, CIKM '06.

[17]  Douglas W. Oard,et al.  Overview of the TREC 2007 Legal Track , 2007, TREC.

[18]  Thomas Mandl,et al.  Recent Developments in the Evaluation of Information Retrieval Systems: Moving Towards Diversity and Practical Relevance , 2008, Informatica.

[19]  Gabriella Kazai,et al.  INEX 2006 Evaluation Measures , 2006, INEX.

[20]  John Tait,et al.  CLEF-IP 2009: Retrieval Experiments in the Intellectual Property Domain , 2009, CLEF.

[21]  Stephen E. Robertson,et al.  THE PARAMETRIC DESCRIPTION OF RETRIEVAL TESTS: PART I: THE BASIC PARAMETERS , 1969 .

[22]  Ellen M. Voorhees,et al.  The Philosophy of Information Retrieval Evaluation , 2001, CLEF.

[23]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Evaluation , 2000, TREC.

[24]  Ellen M. Voorhees,et al.  Bias and the limits of pooling , 2006, SIGIR '06.

[25]  Thomas Mandl Recent Developments in the Evaluation of Information Retrieval Systems : Moving Toward Diversity and Practical Applications , 2005 .

[26]  Ellen M. Voorhees,et al.  Retrieval evaluation with incomplete information , 2004, SIGIR '04.

[27]  Stephen E. Robertson,et al.  A new interpretation of average precision , 2008, SIGIR '08.

[28]  III-l III. PERFORMANCE INDICES FOR DOCUMENT RETRIEVAL SYSTEMS , 2022 .

[29]  Qigang Gao,et al.  Using controlled query generation to evaluate blind relevance feedback algorithms , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[30]  Andreas Rauber,et al.  Analyzing Document Retrievability in Patent Retrieval Settings , 2009, DEXA.

[31]  Jean Tague,et al.  Simulation of user judgments in bibliographic retrieval systems , 1981, SIGIR 1981.

[32]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[33]  M. de Rijke,et al.  Building simulated queries for known-item topics: an analysis using six european languages , 2007, SIGIR.

[34]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .