The effect of threshold priming and need for cognition on relevance calibration and assessment

Human assessments of document relevance are needed for the construction of test collections, for ad-hoc evaluation, and for training text classifiers. Showing documents to assessors in different orderings, however, may lead to different assessment outcomes. We examine the effect that \defineterm{threshold priming}, seeing varying degrees of relevant documents, has on people's calibration of relevance. Participants judged the relevance of a prologue of documents containing highly relevant, moderately relevant, or non-relevant ocuments, followed by a common epilogue of documents of mixed relevance. We observe that participants exposed to only non-relevant documents in the prologue assigned significantly higher average relevance scores to prologue and epilogue documents than participants exposed to moderately or highly relevant documents in the prologue. We also examine how \defineterm{need for cognition}, an individual difference measure of the extent to which a person enjoys engaging in effortful cognitive activity, impacts relevance assessments. High need for cognition participants had a significantly higher level of agreement with expert assessors than low need for cognition participants did. Our findings indicate that assessors should be exposed to documents from multiple relevance levels early in the judging process, in order to calibrate their relevance thresholds in a balanced way, and that individual difference measures might be a useful way to screen assessors.

[1]  David D. Lewis,et al.  Information retrieval for e-discovery , 2010, SIGIR.

[2]  Tefko Saracevic Relevance: A review of the literature and a framework for thinking on the notion in information science. Part III: Behavior and effects of relevance , 2007 .

[3]  Stephen P. Harter,et al.  Psychological Relevance and Information Science , 1992, J. Am. Soc. Inf. Sci..

[4]  Eero Sormunen,et al.  Liberal relevance criteria of TREC -: counting on negligible documents? , 2002, SIGIR '02.

[5]  Michael Eisenberg,et al.  Order effects: A study of the possible influence of presentation order on user judgments of document relevance , 1988, J. Am. Soc. Inf. Sci..

[6]  Mu-Hsuan Huang,et al.  The influence of document presentation order and number of documents judged on users' judgments of relevance , 2004, J. Assoc. Inf. Sci. Technol..

[7]  Nigel Ford,et al.  Web search strategies and human individual differences: Cognitive and demographic factors, Internet attitudes, and approaches: Research Articles , 2005 .

[8]  David Davidson,et al.  The effect of individual differences of cognitive style on judgments of document relevance , 1977, J. Am. Soc. Inf. Sci..

[9]  J. Cacioppo,et al.  DISPOSITIONAL DIFFERENCES IN COGNITIVE MOTIVATION : THE LIFE AND TIMES OF INDIVIDUALS VARYING IN NEED FOR COGNITION , 1996 .

[10]  Pia Borlund,et al.  The concept of relevance in IR , 2003, J. Assoc. Inf. Sci. Technol..

[11]  William Webber,et al.  Effect of written instructions on assessor agreement , 2012, SIGIR '12.

[12]  J. Cacioppo,et al.  The need for cognition. , 1982 .

[13]  T. Saracevic,et al.  Relevance: A review of the literature and a framework for thinking on the notion in information science. Part II: nature and manifestations of relevance , 2007, J. Assoc. Inf. Sci. Technol..

[14]  Carol L. Barry User-Defined Relevance Criteria: An Exploratory Study , 1994, J. Am. Soc. Inf. Sci..

[15]  Ellen M. Voorhees,et al.  Variations in relevance judgments and the measurement of retrieval effectiveness , 1998, SIGIR '98.

[16]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[17]  C. A. Cuadra,et al.  OPENING THE BLACK BOX OF ‘RELEVANCE’ , 1967 .

[18]  Bas Verplanken,et al.  Need for Cognition and External Information Search: Responses to Time Pressure during Decision-Making , 1993 .

[19]  Pertti Vakkari,et al.  Changes in Search Tactics and Relevance Judgements when Preparing a Research Proposal A Summary of the Findings of a Longitudinal Study , 2001, Information Retrieval.

[20]  Gabriella Kazai,et al.  Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking , 2011, SIGIR.

[21]  Joemon M. Jose,et al.  How users assess Web pages for information seeking , 2005, J. Assoc. Inf. Sci. Technol..

[22]  David Miller,et al.  Web search strategies and human individual differences: Cognitive and demographic factors, Internet attitudes, and approaches , 2005, J. Assoc. Inf. Sci. Technol..

[23]  Gabriella Kazai,et al.  An analysis of systematic judging errors in information retrieval , 2012, CIKM.

[24]  Ellen M. Voorhees Variations in relevance judgments and the measurement of retrieval effectiveness , 2000, Inf. Process. Manag..

[25]  Jianqiang Wang,et al.  A user study of relevance judgments for E-Discovery , 2010, ASIST.

[26]  Daniel Memmert,et al.  A calibration explanation of serial position effects in evaluative judgments , 2012 .

[27]  Maura R. Grossman,et al.  Inconsistent Assessment of Responsiveness in E-Discovery: Difference of Opinion or Human Error? , 2011 .

[28]  Mark Sanderson,et al.  Quantifying test collection quality based on the consistency of relevance judgements , 2011, SIGIR.

[29]  John Le,et al.  Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution , 2010 .

[30]  Efthimis N. Efthimiadis,et al.  Legal discovery: Does domain expertise matter? , 2008, ASIST.

[31]  Ben Carterette,et al.  The effect of assessor error on IR system evaluation , 2010, SIGIR.

[32]  William Webber,et al.  Re-examining the Effectiveness of Manual Review , 2011 .

[33]  Robert E. Johnson,et al.  Does order of presentation affect users' judgment of documents? , 1990, J. Am. Soc. Inf. Sci..

[34]  Tefko Saracevic,et al.  RELEVANCE: A review of and a framework for the thinking on the notion in information science , 1997, J. Am. Soc. Inf. Sci..