Evaluating the effort involved in relevance assessments for images

How assessors and end users judge the relevance of images has been studied in information science and information retrieval for a considerable time. The criteria by which assessors' judge relevance has been intensively studied, and there has been a large amount of work which has investigated how relevance judgments for test collections can be more cheaply generated, such as through crowd sourcing. Relatively little work has investigated the process individual assessors go through to judge the relevance of an image. In this paper, we focus on the process by which relevance is judged for images, and in particular, the degree of effort a user must expend to judge relevance for different topics. Results suggest that topic difficulty and how semantic/visual a topic is impact user performance and perceived effort.

[1]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[2]  Pamela Briggs,et al.  A Taxonomy of the Image: On the Classification of Content for Image Retrieval , 2003 .

[3]  M. Smucker,et al.  User Expressions of Relevance Judgment Certainty , 2013 .

[4]  Rong Yan,et al.  Extreme video retrieval: joint maximization of human and computer performance , 2006, MM '06.

[5]  Martin Halvey,et al.  Is relevance hard work?: evaluating the effort of making relevant assessments , 2013, SIGIR.

[6]  Cees Snoek,et al.  Size Matters! How Thumbnail Number, Size, and Motion Influence Mobile Video Retrieval , 2011, MMM.

[7]  Amanda Spink,et al.  From Highly Relevant to Not Relevant: Examining Different Regions of Relevance , 1998, Inf. Process. Manag..

[8]  Michael Grubinger,et al.  Analysis and evaluation of visual information systems performance , 2007 .

[9]  Allan Hanbury,et al.  Overview of the ImageCLEFphoto 2007 Photographic Retrieval Task , 2008, CLEF.

[10]  Charles L. A. Clarke,et al.  Time-based calibration of effectiveness measures , 2012, SIGIR '12.

[11]  Tefko Saracevic,et al.  RELEVANCE: A review of and a framework for the thinking on the notion in information science , 1997, J. Am. Soc. Inf. Sci..

[12]  S. Hart,et al.  Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research , 1988 .

[13]  Leif Azzopardi,et al.  How query cost affects search behavior , 2013, SIGIR.

[14]  Stefano Mizzaro,et al.  Relevance: The Whole History , 1997, J. Am. Soc. Inf. Sci..

[15]  Eero Sormunen,et al.  Liberal relevance criteria of TREC -: counting on negligible documents? , 2002, SIGIR '02.