Div400: a social image retrieval result diversification dataset

In this paper we propose a new dataset, Div400, that was designed to support shared evaluation in different areas of social media photo retrieval, e.g., machine analysis (re-ranking, machine learning), human-based computation (crowdsourcing) or hybrid approaches (relevance feedback, machine-crowd integration). Div400 comes with associated relevance and diversity assessments performed by human annotators. 396 landmark locations are represented via 43,418 Flickr photos and metadata, Wikipedia pages and content descriptors for text and visual modalities. To facilitate distribution, only Creative Commons content was included in the dataset. The proposed dataset was validated during the 2013 Retrieving Diverse Social Images Task at the MediaEval Benchmarking Initiative for Multimedia Evaluation.

[1]  Stevan Rudinac,et al.  Generating Visual Summaries of Geographic Areas Using Community-Contributed Images , 2013, IEEE Transactions on Multimedia.

[2]  Mark Sanderson,et al.  Diversity in Photo Retrieval: Overview of the ImageCLEFPhoto Task 2009 , 2009, CLEF.

[3]  Justus J. Randolph Free-Marginal Multirater Kappa (multirater K[free]): An Alternative to Fleiss' Fixed-Marginal Multirater Kappa. , 2005 .

[4]  Neha Jain,et al.  Experiments in Diversifying Flickr Result Sets , 2013, MediaEval.

[5]  Antonella De Angeli,et al.  A Hybrid Machine-Crowd Approach to Photo Retrieval Result Diversification , 2014, MMM.

[6]  Matti Pietikäinen,et al.  Performance evaluation of texture measures with classification based on Kullback discrimination of distributions , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[7]  Xiaoou Tang,et al.  Texture information in run-length matrices , 1998, IEEE Trans. Image Process..

[8]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[9]  Adrian Popescu,et al.  Social media driven image retrieval , 2011, ICMR.

[10]  Cordelia Schmid,et al.  Learning Color Names for Real-World Applications , 2009, IEEE Transactions on Image Processing.

[11]  Markus A. Stricker,et al.  Similarity of color images , 1995, Electronic Imaging.

[12]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[13]  Gerhard Weikum,et al.  Gathering and ranking photos of named entities with high precision, high recall, and diversity , 2010, WSDM '10.

[14]  Urbano Nunes,et al.  Trainable classifier-fusion schemes: An application to pedestrian detection , 2009, 2009 12th International IEEE Conference on Intelligent Transportation Systems.

[15]  B. S. Manjunath,et al.  Color and texture descriptors , 2001, IEEE Trans. Circuits Syst. Video Technol..

[16]  Ximena Olivares,et al.  Visual diversification of image search results , 2009, WWW '09.

[17]  Henning Müller,et al.  Retrieving Diverse Social Images at MediaEval 2013: Objectives, Dataset and Evaluation , 2013, MediaEval.