Nonparametric Estimation of Fisher Vectors to Aggregate Image Descriptors

We investigate how to represent a natural image in order to be able to recognize the visual concepts within it. The core of the proposed method consists in a new approach to aggregate local features, based on a non-parametric estimation of the Fisher vector, that result from the derivation of the gradient of the loglikelihood. For this, we need to use low level local descriptors that are learned with independent component analysis and thus provide a statistically independent description of the images. The resulting signature has a very intuitive interpretation and we propose an efficient implementation as well. We show on publicly available datasets that the proposed image signature performs very well.

[1]  Hervé Le Borgne,et al.  Fast shared boosting for large-scale concept detection , 2012, Multimedia Tools and Applications.

[2]  Nuno Vasconcelos,et al.  On the design of robust classifiers for computer vision , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Hervé Le Borgne,et al.  Fast shared boosting: Application to large-scale visual concept detection , 2010, 2010 International Workshop on Content Based Multimedia Indexing (CBMI).

[4]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Thomas Deselaers,et al.  The Visual Concept Detection Task in ImageCLEF 2008 , 2008, CLEF.

[6]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[8]  Beatrice Gralton,et al.  Washington DC - USA , 2008 .

[9]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Mubarak Shah,et al.  Scene Modeling Using Co-Clustering , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[12]  Nuno Vasconcelos,et al.  Scene classification with low-dimensional semantic spaces and weak supervision , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Nuno Vasconcelos,et al.  Holistic context modeling using semantic co-occurrences , 2009, CVPR.

[14]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Anestis Antoniadis,et al.  Representation of images for classification with independent features , 2004, Pattern Recognit. Lett..

[16]  C. J. Stone,et al.  Logspline Density Estimation for Censored Data , 1992 .

[17]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[18]  Erkki Oja,et al.  Independent Component Analysis , 2001 .

[19]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[21]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[22]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..