Adaptive spatial pooling for image classification

In this paper, we propose an adaptive spatial pooling method for enhancing the discriminability of feature representation for image classification. The core idea is to adopt a spatial distribution matrix to define how the image patches are pooled together. By formulating the pooling distribution learning and classifier training jointly, our method can extract multiple spatial layouts of arbitrary shapes rather than regular rectangular regions. By proper mathematical transformation, the distributions can be learned via a boosting-like algorithm, which improves the efficiency of learning especially for large distribution matrices. Further, our method allows category-specific pooling operations to take advantage of the different spatial layouts of different categories. Experimental results on three benchmark datasets UIUC-Sports, 21-Land-Use and Scene 15 demonstrate the effectiveness of our method. HighlightsPooling operator is parameterized as a distribution matrix.Each category enjoys its own pooling operator.The pooling distribution learning and the classifier training are jointly formulated.The problem is solved efficiently via a boosting-like algorithm.The proposed models outperform many other popular methods.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Shawn D. Newsam,et al.  Spatial pyramid co-occurrence for image classification , 2011, 2011 International Conference on Computer Vision.

[5]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[6]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[7]  Florent Perronnin,et al.  Modeling the spatial layout of images beyond spatial pyramids , 2012, Pattern Recognit. Lett..

[8]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[9]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[10]  Qi Tian,et al.  A Boosting, Sparsity- Constrained Bilinear Model for Object Recognition , 2012, IEEE MultiMedia.

[11]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[12]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[14]  Tieniu Tan,et al.  Salient coding for image classification , 2011, CVPR 2011.

[15]  Fei-Fei Li,et al.  Object-Centric Spatial Pooling for Image Classification , 2012, ECCV.

[16]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[17]  B. Yegnanarayana,et al.  Artificial Neural Networks , 2004 .

[18]  Jun Zhu,et al.  Learning reconfigurable scene representation by tangram model , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[19]  Luc Van Gool,et al.  Latent Dictionary Learning for Sparse Representation Based Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Mario Fritz,et al.  Learning Smooth Pooling Regions for Visual Recognition , 2013, BMVC.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Rama Chellappa,et al.  Sparse dictionary-based representation and recognition of action attributes , 2011, 2011 International Conference on Computer Vision.

[23]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  Bingbing Ni,et al.  Geometric ℓp-norm feature pooling for image classification , 2011, CVPR 2011.

[25]  Yuning Jiang,et al.  Randomized Spatial Partition for Scene Recognition , 2012, ECCV.

[26]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[27]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Liang Wang,et al.  Multiple spatial pooling for visual object recognition , 2014, Neurocomputing.

[29]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[30]  Nicolas Le Roux,et al.  Ask the locals: Multi-way local pooling for image recognition , 2011, 2011 International Conference on Computer Vision.

[31]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[32]  Rui Zhang,et al.  Image Classification by Hierarchical Spatial Pooling with Partial Least Squares Analysis , 2012, BMVC.

[33]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[34]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[35]  Cordelia Schmid,et al.  Learning Object Representations for Visual Object Class Recognition , 2007, ICCV 2007.

[36]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[37]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Matti Pietikäinen,et al.  Performance evaluation of texture measures with classification based on Kullback discrimination of distributions , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[39]  Guillermo Sapiro,et al.  Classification and clustering via dictionary learning with structured incoherence and shared features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40]  Lei Wang,et al.  In defense of soft-assignment coding , 2011, 2011 International Conference on Computer Vision.

[41]  Yasuo Kuniyoshi,et al.  Discriminative spatial pyramid , 2011, CVPR 2011.

[42]  Luca Maria Gambardella,et al.  Flexible, High Performance Convolutional Neural Networks for Image Classification , 2011, IJCAI.

[43]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[44]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[45]  Anil K. Jain,et al.  Artificial Neural Networks: A Tutorial , 1996, Computer.

[46]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[47]  Jean Ponce,et al.  A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[48]  Andrew Zisserman,et al.  Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[49]  Trevor Darrell,et al.  Beyond spatial pyramids: Receptive field learning for pooled image features , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[51]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.