Variations in Relevance Judgments and the Evaluation of Retrieval Performance

Abstract The relevance judgments used to evaluate the performance of information retrieval systems are known to vary among judges and to vary under certain conditions extraneous to the relevance relationship between queries and documents. The study reported here investigated the degree to which variations in relevance judgments affect the evaluation of retrieval performance. Four sets of relevance judgments were used to test the retrieval effectiveness of six document representations. In no case was there a noticeable or material difference in retrieval performance due to variations in relevance judgment. Additionally, for each set of relevance judgments, the relative performance of the six different document representations was the same. Reasons why variations in relevance judgments may not affect recall and precision results were examined in further detail.

[1]  Ann O'Brien,et al.  Relevance as an aid to evaluation in OPACs , 1990, J. Inf. Sci..

[2]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[3]  Michael B. Eisenberg,et al.  DICHOTOMOUS RELEVANCE JUDGMENTS AND THE EVALUATION OF INFORMATION SYSTEMS. , 1987 .

[4]  C. A. Cuadra,et al.  OPENING THE BLACK BOX OF ‘RELEVANCE’ , 1967 .

[5]  Douglas G. Schultz,et al.  A Field Experimental Approach to the Study of Relevance Assessments in Relation to Document Searching. Final Report to the National Science Foundation. Volume II, Appendices. , 1967 .

[6]  John O'Connor Some independent agreements and resolved disagreements about answer‐providing documents , 1969 .

[7]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[8]  A Resnick,et al.  Relative Effectiveness of Document Titles and Abstracts for Determining Relevance of Documents , 1961, Science.

[9]  William M. Shaw,et al.  Subject indexing and citation indexing-- part II: An evaluation and comparison , 1990, Inf. Process. Manag..

[10]  W. S. Cooper Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systems , 1968 .

[11]  William M. Shaw,et al.  On the foundation of evaluation , 1986, J. Am. Soc. Inf. Sci..

[12]  Michael Eisenberg,et al.  Order effects: A study of the possible influence of presentation order on user judgments of document relevance , 1988, J. Am. Soc. Inf. Sci..

[13]  Harold Borko,et al.  Automatic indexing , 1981, ACM '81.

[14]  Michael B. Eisenberg Measuring relevance judgments , 1988, Inf. Process. Manag..

[15]  William M. Shaw,et al.  An investigation of document structures , 1990, Inf. Process. Manag..

[16]  Robert M. Losee,et al.  An analytic measure predicting information retrieval system performance , 1991, Inf. Process. Manag..

[17]  Robert Burgin The effect of indexing exhaustivity on retrieval performance , 1991, Inf. Process. Manag..

[18]  Gerard Salton,et al.  Automatic indexing , 1980, ACM '80.

[19]  G. J. Rath,et al.  Comparisons of four types of lexical indicators of content , 1961 .

[20]  William M. Shaw,et al.  An investigation of document partitions , 1986, Inf. Process. Manag..

[21]  Chris Buckley,et al.  Implementation of the SMART Information Retrieval System , 1985 .

[22]  Robert M. Losee,et al.  Minimizing information overload: the ranking of electronic messages , 1989, J. Inf. Sci..

[23]  M. Rorvig Psychometric measurement and information retrieval , 1988 .

[24]  C. D. Gull Seven years of work on the organization of materials in the special library , 1956 .

[25]  Michael E. Lesk,et al.  Relevance assessments and retrieval system evaluation , 1968, Inf. Storage Retr..

[26]  John J. Regazzi Performance measures for information retrieval systems ― an experimental approach , 1988 .

[27]  Charles Oppenheim,et al.  Retrieval tests on five classification schemes , 1978 .