Relevance judgments between TREC and Non-TREC assessors

This paper investigates the agreement of relevance assessments between official TREC judgments and those generated from an interactive IR experiment. Results show that 63% of documents judged relevant by our users matched official TREC judgments. Several factors contributed to differences in the agreements: the number of retrieved relevant documents; the number of relevant documents judged; system effectiveness per topic and the ranking of relevant documents.