High-Order Local Pooling and Encoding Gaussians Over a Dictionary of Gaussians

Local pooling (LP) in configuration (feature) space proposed by Boureau et al. explicitly restricts similar features to be aggregated, which can preserve as much discriminative information as possible. At the time it appeared, this method combined with sparse coding achieved competitive classification results with only a small dictionary. However, its performance lags far behind the state-of-the-art results as only the zero-order information is exploited. Inspired by the success of high-order statistical information in existing advanced feature coding or pooling methods, we make an attempt to address the limitation of LP. To this end, we present a novel method called high-order LP (HO-LP) to leverage the information higher than the zero-order one. Our idea is intuitively simple: we compute the first- and second-order statistics per configuration bin and model them as a Gaussian. Accordingly, we employ a collection of Gaussians as visual words to represent the universal probability distribution of features from all classes. Our problem is naturally formulated as encoding Gaussians over a dictionary of Gaussians as visual words. This problem, however, is challenging since the space of Gaussians is not a Euclidean space but forms a Riemannian manifold. We address this challenge by mapping Gaussians into the Euclidean space, which enables us to perform coding with common Euclidean operations rather than complex and often expensive Riemannian operations. Our HO-LP preserves the advantages of the original LP: pooling only similar features and using a small dictionary. Meanwhile, it achieves very promising performance on standard benchmarks, with either conventional, hand-engineered features or deep learning-based features.

[1]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[2]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Xiu-Shen Wei,et al.  Deep Spatial Pyramid: The Devil is Once Again in the Details , 2015, ArXiv.

[4]  Lei Zhang,et al.  Local Log-Euclidean Multivariate Gaussian Descriptor and Its Application to Image Classification , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Fatih Murat Porikli,et al.  Learning on lie groups for invariant detection and tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Nuno Vasconcelos,et al.  Scene classification with semantic Fisher vectors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[9]  Jean Ponce,et al.  A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[10]  Nuno Vasconcelos,et al.  The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition , 2004, ECCV.

[11]  Rita Cucchiara,et al.  GOLD: Gaussians of Local Descriptors for image representation , 2015, Comput. Vis. Image Underst..

[12]  Zuxi Wang,et al.  Extended Shape of Gaussian: Feature descriptor based on element set of matrix Lie group , 2013 .

[13]  N. Ayache,et al.  Log‐Euclidean metrics for fast and simple calculus on diffusion tensors , 2006, Magnetic resonance in medicine.

[14]  B. Hall Lie Groups, Lie Algebras, and Representations: An Elementary Introduction , 2004 .

[15]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[16]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[17]  Fang Liu,et al.  Shape of Gaussians as feature descriptors , 2009, CVPR.

[18]  Bowen Zhang,et al.  Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition , 2016, IEEE Transactions on Image Processing.

[19]  Nicholas Ayache,et al.  A Fast and Log-Euclidean Polyaffine Framework for Locally Linear Registration , 2009, Journal of Mathematical Imaging and Vision.

[20]  Andrew Zisserman,et al.  Deep Fisher Networks for Large-Scale Image Classification , 2013, NIPS.

[21]  Edward H. Adelson,et al.  Recognizing Materials Using Perceptually Inspired Features , 2013, International Journal of Computer Vision.

[22]  Mohammed Bennamoun,et al.  A Discriminative Representation of Convolutional Features for Indoor Scene Recognition , 2015, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[23]  Tieniu Tan,et al.  Feature Coding in Image Classification: A Comprehensive Study , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  B. Hall Lie Groups, Lie Algebras, and Representations , 2003 .

[25]  Nicolas Le Roux,et al.  Ask the locals: Multi-way local pooling for image recognition , 2011, 2011 International Conference on Computer Vision.

[26]  Yasuo Kuniyoshi,et al.  Global Gaussian approach for scene categorization using information geometry , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Ankur Agarwal,et al.  Hyperfeatures - Multilevel Local Coding for Visual Recognition , 2006, ECCV.

[28]  Qilong Wang,et al.  From dictionary of visual words to subspaces: Locality-constrained affine subspace coding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Yizhou Yu,et al.  Harvesting Discriminative Meta Objects with Deep CNN Features for Scene Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Giorgio Metta,et al.  Ask the Image: Supervised Pooling to Preserve Feature Locality , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[32]  Mehrtash Tafazzoli Harandi,et al.  Riemannian coding and dictionary learning: Kernels to the rescue , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Martial Hebert,et al.  Self-explanatory Sparse Representation for Image Classification , 2014, ECCV.

[34]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[35]  Zhen Li,et al.  Hierarchical Gaussianization for image classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[36]  Subhransu Maji,et al.  Deep filter banks for texture recognition and segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[38]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[39]  Qi Tian,et al.  Ieee Transactions on Image Processing Spatial Pooling of Heterogeneous Features for Image Classification , 2022 .

[40]  Shuicheng Yan,et al.  Hybrid CNN and Dictionary-Based Models for Scene Recognition and Domain Adaptation , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[41]  Lei Zhang,et al.  Log-Euclidean Kernels for Sparse Representation and Dictionary Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[42]  Cristian Sminchisescu,et al.  Free-Form Region Description with Second-Order Pooling , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Iasonas Kokkinos,et al.  Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Takumi Kobayashi,et al.  Dirichlet-Based Histogram Feature Transform for Image Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[46]  Lior Wolf,et al.  Associating neural word embeddings with deep image representations using Fisher Vectors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Anton van den Hengel,et al.  The treasure beneath convolutional layers: Cross-convolutional-layer pooling for image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[49]  Limin Wang,et al.  Boosting VLAD with Supervised Dictionary Learning and High-Order Statistics , 2014, ECCV.

[50]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[51]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[52]  Lei Wang,et al.  In defense of soft-assignment coding , 2011, 2011 International Conference on Computer Vision.

[53]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[54]  Lei Zhang,et al.  Local Log-Euclidean Multivariate Gaussian Descriptor and Its Application to Image Classification. , 2017, IEEE transactions on pattern analysis and machine intelligence.

[55]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[56]  Dieter Fox,et al.  Multipath Sparse Coding Using Hierarchical Matching Pursuit , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[58]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[59]  Luis Herranz,et al.  Scene Recognition with CNNs: Objects, Scales and Dataset Bias , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Lei Zhang,et al.  A Novel Earth Mover's Distance Methodology for Image Matching with Gaussian Mixture Models , 2013, 2013 IEEE International Conference on Computer Vision.

[61]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[62]  Liyu Gong,et al.  Shape of Gaussians as feature descriptors , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Lei Zhang,et al.  Towards effective codebookless model for image classification , 2015, Pattern Recognit..

[64]  Edward H. Adelson,et al.  Material perception: What can you see in a brief glance? , 2010 .

[65]  J. Gallier Logarithms and Square Roots of Real Matrices , 2008, 0805.0245.