On the history of evaluation in IR

This paper is a personal take on the history of evaluation experiments in information retrieval. It describes some of the early experiments that were formative in our understanding, and goes on to discuss the current dominance of TREC (the Text REtrieval Conference) and to assess its impact.

[1]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[2]  Don R. Swanson,et al.  Some Unexplained Aspects of the Cranfield Tests of Indexing Performance Factors , 1971, The Library Quarterly.

[3]  David C. Blair,et al.  Some thoughts on the reported results of TREC , 2002, Inf. Process. Manag..

[4]  C. J. van Rijsbergen,et al.  Report on the need for and provision of an 'ideal' information retrieval test collection , 1975 .

[5]  Stephen E. Robertson,et al.  On the Evaluation of IR Systems , 1992, Inf. Process. Manag..

[6]  Stephen E. Robertson,et al.  Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.

[7]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[8]  F. W. Lancaster,et al.  MEDLARS: Report on the Evaluation of Its Operating Efficiency. , 1997 .

[9]  JonesK. Sparck,et al.  A probabilistic model of information retrieval , 2000 .

[10]  Stephen Walker,et al.  Designing an online public access catalogue: Okapi, a catalogue on a Local Area Network , 1985 .

[11]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[12]  Micheline Beaulieu,et al.  Experiments on interfaces to support query expansion , 1997, J. Documentation.

[13]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[14]  W. Bruce Croft,et al.  Evaluation of an inference network-based retrieval model , 1991, TOIS.

[15]  W. Bruce Croft Advances in Informational Retrieval: Recent Research from the Center for Intelligent Information Retrieval , 2000 .

[16]  Karen Sparck Jones Automatic keyword classification for information retrieval , 1971 .

[17]  E. Michael Keen,et al.  The Aberystwyth Index Languages Test. , 1973 .

[18]  Ellen M. Voorhees,et al.  TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing) , 2005 .

[19]  Cyril W. Cleverdon,et al.  Factors determining the performance of indexing systems , 1966 .

[20]  Gobinda G. Chowdhury,et al.  TREC: Experiment and Evaluation in Information Retrieval , 2007 .

[21]  Stephen E. Robertson,et al.  Overview of the Okapi projects , 1997, J. Documentation.

[22]  Cyril W. Cleverdon,et al.  Aslib Cranfield research project: report on the testing and analysis of an investigation into the comparative efficiency of indexing systems , 1962 .

[23]  Marko Ristin,et al.  Language Modelling in Information Retrieval , 2007 .

[24]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[25]  J. D. Bernal,et al.  The Royal Society Scientific Information Conference , 1948, Nature.

[26]  Cyril W. Cleverdon,et al.  Aslib Cranfield research project - Factors determining the performance of indexing systems; Volume 1, Design; Part 2, Appendices , 1966 .

[27]  Nicholas J. Belkin,et al.  Ask for Information Retrieval: Part I. Background and Theory , 1997, J. Documentation.

[28]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[29]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 2 , 2000, Inf. Process. Manag..

[30]  Robert N. Oddy,et al.  INFORMATION RETRIEVAL THROUGH MAN‐MACHINE DIALOGUE , 1977 .

[31]  Stephen Walker,et al.  The Okapi online catalogue research projects , 1997 .

[32]  Charles R. Hildreth,et al.  The Online catalogue : developments and directions , 1989 .

[33]  C. Cleverdon Report on the testing and analysis of an investigation into comparative efficiency of indexing systems , 1962 .