Creating a test collection to evaluate diversity in image retrieval

This paper describes the adaptation of an existing test collection for image retrieval to enable diversity in the results set to be measured. Previous research has shown that a more diverse set of results often satisfies the needs of more users better than standard document rankings. To enable diversity to be quantified, it is necessary to classify images relevant to a given theme to one or more sub-topics or clusters. We describe the challenges in building (as far as we are aware) the first test collection for evaluating diversity in image retrieval. This includes selecting appropriate topics, creating sub-topics, and quantifying the overall effectiveness of a retrieval system. A total of 39 topics were augmented for cluster-based relevance and we also provide an initial analysis of assessor agreement for grouping relevant images into sub-topics or clusters.

[1]  Kai Song,et al.  Diversifying the image retrieval results , 2006, MM '06.

[2]  Kamal Ali,et al.  Exploring Cost-Effective Approaches to Human Evaluation of Search Engine Relevance , 2005, ECIR.

[3]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[4]  Cyril Cleverdon,et al.  The Cranfield tests on index language devices , 1997 .

[5]  Jia Li Two-scale image retrieval with significant meta-information feedback , 2005, MULTIMEDIA '05.

[6]  ChengXiang Zhai,et al.  Risk minimization and language modeling in text retrieval dissertation abstract , 2002, SIGF.

[7]  Mark Sanderson,et al.  Ambiguous queries: test collections need more sense , 2008, SIGIR '08.

[8]  Ben Carterette,et al.  Beyond binary relevance: preferences, diversity, and set-level judgments , 2008, SIGF.

[9]  Thomas Deselaers,et al.  Overview of the ImageCLEF 2006 Photographic Retrieval and Object Annotation Tasks , 2006, CLEF.

[10]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[11]  Paul D. Clough,et al.  On the Creation of Query Topics for ImageCLEFphoto , 2007 .

[12]  Allan Hanbury,et al.  Overview of the ImageCLEFphoto 2007 Photographic Retrieval Task , 2008, CLEF.

[13]  Paul Clough,et al.  The IAPR TC-12 Benchmark: A New Evaluation Resource for Visual Information Systems , 2006 .

[14]  Paul Over,et al.  TREC-8 interactive track , 1999, SIGF.

[15]  Ellen M. Voorhees,et al.  The Philosophy of Information Retrieval Evaluation , 2001, CLEF.

[16]  Ellen M. Voorhees,et al.  Evaluating Evaluation Measure Stability , 2000, SIGIR 2000.