LEAR and XRCE's Participation to Visual Concept Detection Task - ImageCLEF 2010

In this paper we present the common effort of Lear and XRCE for the ImageCLEF Visual Concept Detection and Annotation Task. We first sought to combine our individual state-of-the-art approaches: the Fisher vector image representation, with the TagProp method for image auto-annotation. Our second motivation was to investigate the annotation performance by using extra information in the form of provided Flickr-tags. The results show that using the Flickr-tags in combination with visual features improves the results of any method using only visual features. Our winning system, an early-fusion linear-SVM classifier, trained on visual and Flickr-tags features, obtains 45.5% in mean Average Precision (mAP), almost a 5% absolute improvement compared to the best visual-only system. Our best visual-only system obtains 39.0% mAP, and is close to the best visual-only system. It is a late-fusion linear-SVM classifier, trained on two types of visual features (SIFT and colour). The performance of TagProp is close to our SVM classifiers. The methods presented in this paper, are all scalable to large datasets and/or many concepts. This is due to the fast FK framework for image representation, and due to the classifiers. The linear SVM classifier has proven to scale well for large datasets. The k-NN approach of TagProp, is interesting in this respect since it requires only 2 parameters per concept.

[1]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[2]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[3]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[4]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[5]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[7]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[8]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2007, ICML '07.

[11]  Cordelia Schmid,et al.  The Pascal Visual Object Classes Challenge 2008 submission , 2008 .

[12]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[13]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14]  Daniel P. Huttenlocher,et al.  Landmark classification in large-scale image collections , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15]  Stefanie Nowak,et al.  New Strategies for Image Annotation: Overview of the Photo Annotation Task at ImageCLEF 2010 , 2010, CLEF.

[16]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[17]  Cordelia Schmid,et al.  Image annotation with tagprop on the MIRFLICKR set , 2010, MIR '10.

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..