Beyond SIFT for image classification

In classifying images, scenes or objects, the most popular approach is based on the features extraction-coding-pooling framework allowing to generate discriminative and robust image representations from densely extracted local patches, mainly some SIFT/HOG ones. The majority of the latest research is focused on how to improve successfully these coding and pooling parts. In this work, we show that substantial improvements can be also obtained by coding information closer to the pixel values level in the same way that deep-learning architectures do. We introduce a two layer, stacked, coder-pooler architecture where the first layer is specifically dedicated to extract, from our so-called Differential Vectors (DV) patches, some efficient, local low-level features more discriminative and efficient that their classic handcrafted counterpart. This first layer can advantageously replace any classic dense SIFT/HOG patches extraction stage. We demonstrate the effectiveness of our approach on three datasets: UIUC-Sports, Scene 15 and Caltech 101. We achieve excellent performances with simple linear classification while using basic coding and pooling schemes for both layers, i.e. Sparse Coding (SC) and Max-Pooling (MP) respectively.

[1]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[2]  Frédéric Jurie,et al.  Modeling spatial layout with fisher vectors for image categorization , 2011, 2011 International Conference on Computer Vision.

[3]  Jonghyun Choi,et al.  A complementary local feature descriptor for face identification , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[4]  Dieter Fox,et al.  Kernel Descriptors for Visual Recognition , 2010, NIPS.

[5]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[6]  Bingbing Ni,et al.  Geometric ℓp-norm feature pooling for image classification , 2011, CVPR 2011.

[7]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[8]  Shenghuo Zhu,et al.  Deep Coding Network , 2010, NIPS.

[9]  Wen Gao,et al.  Group-sensitive multiple kernel learning for object categorization , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Narendra Ahuja,et al.  Learning subcategory relevances for category recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[13]  Yann LeCun,et al.  Learning Invariant Feature Hierarchies , 2012, ECCV Workshops.

[14]  Giovanni Maria Farinella,et al.  Scene categorization using bag of Textons on spatial hierarchy , 2008, 2008 15th IEEE International Conference on Image Processing.

[15]  Jean Ponce,et al.  A graph-matching kernel for object categorization , 2011, 2011 International Conference on Computer Vision.

[16]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[17]  Liang-Tien Chia,et al.  Local features are not lonely – Laplacian sparse coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[19]  Cristian Sminchisescu,et al.  Object recognition as ranking holistic figure-ground hypotheses , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Xuelong Li,et al.  Beyond Spatial Pyramids: A New Feature Extraction Framework with Dense Spatial Sampling for Image Classification , 2012, ECCV.

[21]  Mario Fernando Montenegro Campos,et al.  Sparse Spatial Coding: A novel approach for efficient and accurate object recognition , 2012, 2012 IEEE International Conference on Robotics and Automation.

[22]  Trevor Darrell,et al.  Beyond spatial pyramids: Receptive field learning for pooled image features , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Y-Lan Boureau,et al.  Learning Hierarchical Feature Extractors For Image Recognition , 2012 .

[24]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[26]  Dieter Fox,et al.  Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms , 2011, NIPS.

[27]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28]  Florent Perronnin,et al.  Modeling the spatial layout of images beyond spatial pyramids , 2012, Pattern Recognit. Lett..

[29]  Nicolas Le Roux,et al.  Ask the locals: Multi-way local pooling for image recognition , 2011, 2011 International Conference on Computer Vision.

[30]  Fahad Shahbaz Khan,et al.  Discriminative compact pyramids for object and scene recognition , 2012, Pattern Recognition.

[31]  Yann LeCun,et al.  Convolutional networks and applications in vision , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[32]  Frédéric Jurie,et al.  Improving Image Classification Using Semantic Attributes , 2012, International Journal of Computer Vision.

[33]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[35]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[36]  Dieter Fox,et al.  Object recognition with hierarchical kernel descriptors , 2011, CVPR 2011.

[37]  Alfred O. Hero,et al.  Efficient learning of sparse, distributed, convolutional feature representations for object recognition , 2011, 2011 International Conference on Computer Vision.

[38]  Bill Triggs,et al.  Visual Recognition Using Local Quantized Patterns , 2012, ECCV.

[39]  Fei-Fei Li,et al.  Object-Centric Spatial Pooling for Image Classification , 2012, ECCV.

[40]  Baoxin Li,et al.  Discriminative affine sparse codes for image classification , 2011, CVPR 2011.

[41]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Gaurav Sharma,et al.  Local Higher-Order Statistics (LHS) for Texture Categorization and Facial Analysis , 2012, ECCV.

[43]  John D. Lafferty,et al.  Learning image representations from the pixel level via hierarchical sparse coding , 2011, CVPR 2011.

[44]  Matthieu Cord,et al.  BOSSA: Extended bow formalism for image classification , 2011, 2011 18th IEEE International Conference on Image Processing.

[45]  Hervé Glotin,et al.  Efficient Bag of Scenes Analysis for Image Categorization , 2013, ICPRAM.

[46]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.