CEA LIST's Participation to Visual Concept Detection Task of ImageCLEF 2011

This paper describes the CEA LIST participation in the ImageCLEF 2011 Photo Annotation challenge. This year, our motivation was to investigate the annotation performance by using provided Flickr-tags as additionnal infor- mation. First, we present an overview of our local and global visual features used in this work. Second, we present a new method, that we call "Fuzzy-tfidf", which takes into account the uncertainty of user tags. Our textual descriptor is based on semantic similarity between tags and visual concepts. To compute this similarity, we used two distances: the first one is based on Wordnet ontology and the second is based on social networks. We perform a late fusion to combine scores from visual and textual modalities. Our best model, a late fusion trained on global visual features and user tags, obtains 38.3 % MAP, almost a 8 % MAP absolute improvement compared to our best visual-only system. The results show that the combination of Flickr-tags with visual features improves the results of the run using only visual features. It corroborates the importance of taking into account the uncertainty of user tags and the complementarity between visual and textual modalities.

[1]  Hervé Le Borgne,et al.  Nonparametric Estimation of Fisher Vectors to Aggregate Image Descriptors , 2011, ACIVS.

[2]  C. J. Stone,et al.  Logspline Density Estimation for Censored Data , 1992 .

[3]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[5]  Anne Lohrli Chapman and Hall , 1985 .

[6]  Stefanie Nowak,et al.  The CLEF 2011 Photo Annotation and Concept-based Retrieval Tasks , 2011, CLEF.

[7]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[8]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[9]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[10]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[11]  Hervé Le Borgne,et al.  Fast shared boosting: Application to large-scale visual concept detection , 2010, 2010 International Workshop on Content Based Multimedia Indexing (CBMI).

[12]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[15]  Shu-Yuan Chen,et al.  Image classification using color, texture and regions , 2003, Image Vis. Comput..

[16]  Adrian Popescu,et al.  Social media driven image retrieval , 2011, ICMR.

[17]  Mario A. Nascimento,et al.  A compact and efficient image retrieval approach based on border/interior pixel classification , 2002, CIKM '02.

[18]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[19]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[20]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.