Subdiv17: a dataset for investigating subjectivity in the visual diversification of image search results

In this paper, we present a new dataset that facilitates the comparison of approaches aiming at the diversification of image search results. The dataset was explicitly designed for general-purpose, multi-topic queries and provides multiple ground truth annotations to allow for the exploration of the subjectivity aspect in the general task of diversification. The dataset provides images and their metadata retrieved from Flickr for around 200 complex queries. Additionally, to encourage experimentations (and cooperations) from different communities such as information and multimedia retrieval, a broad range of pre-computed descriptors is provided. The proposed dataset was successfully validated during the MediaEval 2017 Retrieving Diverse Social Images task using 29 submitted runs.

[1]  Yiannis S. Boutalis,et al.  Selection of the proper Compact Composite Descriptor for improving content based image retrieval , 2009 .

[2]  Ismail Sengör Altingövde,et al.  Scalable and Efficient Web Search Result Diversification , 2016, ACM Trans. Web.

[3]  Martha Larson,et al.  User Intent in Multimedia Search , 2016, ACM Comput. Surv..

[4]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[5]  Timos K. Sellis,et al.  Diversifying Microblog Posts , 2014, WISE.

[6]  Mathias Lux,et al.  Content based image retrieval with LIRe , 2011, ACM Multimedia.

[7]  Stefanie Nowak,et al.  How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation , 2010, MIR '10.

[8]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[9]  M. de Rijke,et al.  Search Result Diversification in Short Text Streams , 2017, ACM Trans. Inf. Syst..

[10]  Bogdan Ionescu,et al.  Toward an Estimation of User Tagging Credibility for Social Image Retrieval , 2014, ACM Multimedia.

[11]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[12]  Yiannis S. Boutalis,et al.  FCTH: Fuzzy Color and Texture Histogram - A Low Level Feature for Accurate Image Retrieval , 2008, 2008 Ninth International Workshop on Image Analysis for Multimedia Interactive Services.

[13]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[14]  Gabriela Csurka,et al.  NLE@MediaEval'17: Combining Cross-Media Similarity and Embeddings for Retrieving Diverse Social Images , 2017, MediaEval.

[15]  Henning Müller,et al.  Div400: a social image retrieval result diversification dataset , 2014, MMSys '14.

[16]  Camille Roth,et al.  Natural Scales in Geographical Patterns , 2017, Scientific Reports.

[17]  Bogdan Ionescu,et al.  Retrieving Diverse Social Images at MediaEval 2017: Challenges, Dataset and Evaluation , 2017, MediaEval.

[18]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[19]  Ismail Sengör Altingövde,et al.  Result Diversification for Tweet Search , 2014, WISE.

[20]  Karl Aberer,et al.  An Evaluation of Diversification Techniques , 2015, DEXA.

[21]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[22]  Maarten de Rijke,et al.  Efficient Structured Learning for Personalized Diversification , 2016, IEEE Transactions on Knowledge and Data Engineering.

[23]  Henning Müller,et al.  Div150Cred: A social image retrieval result diversification with user tagging credibility dataset , 2015, MMSys.

[24]  Jing Huang,et al.  Image indexing using color correlograms , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Yiannis S. Boutalis,et al.  CEDD: Color and Edge Directivity Descriptor: A Compact Descriptor for Image Indexing and Retrieval , 2008, ICVS.

[26]  B. S. Manjunath,et al.  Color and texture descriptors , 2001, IEEE Trans. Circuits Syst. Video Technol..

[27]  Xueqi Cheng,et al.  Learning for search result diversification , 2014, SIGIR.

[28]  R. Pontius,et al.  Death to Kappa: birth of quantity disagreement and allocation disagreement for accuracy assessment , 2011 .

[29]  Jianzhong Li,et al.  A survey of query result diversification , 2017, Knowledge and Information Systems.

[30]  Sourav S. Bhowmick,et al.  Image tag clarity: in search of visual-representative tags for social images , 2009, WSM@MM.

[31]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[32]  Evaggelia Pitoura,et al.  Search result diversification , 2010, SGMD.

[33]  Xiaohua Hu,et al.  Product review summarization through question retrieval and diversification , 2017, Information Retrieval Journal.

[34]  K. Gwet Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters , 2014 .

[35]  Timos K. Sellis,et al.  Algorithms and criteria for diversification of news article comments , 2015, Journal of Intelligent Information Systems.

[36]  Henning Müller,et al.  Div150Multi: a social image retrieval result diversification dataset with multi-topic queries , 2016, MMSys.