Power Normalizations in Fine-Grained Image, Few-Shot Image and Graph Classification

Power Normalizations (PN) are useful non-linear operators which tackle feature imbalances in classification problems. We study PNs in the deep learning setup via a novel PN layer pooling feature maps. Our layer combines the feature vectors and their respective spatial locations in the feature maps produced by the last convolutional layer of CNN into a positive definite matrix with second-order statistics to which PN operators are applied, forming so-called Second-order Pooling (SOP). As the main goal of this paper is to study Power Normalizations, we investigate the role and meaning of MaxExp and Gamma, two popular PN functions. To this end, we provide probabilistic interpretations of such element-wise operators and discover surrogates with well-behaved derivatives for end-to-end training. Furthermore, we look at the spectral applicability of MaxExp and Gamma by studying Spectral Power Normalizations (SPN). We show that SPN on the autocorrelation/covariance matrix and the Heat Diffusion Process (HDP) on a graph Laplacian matrix are closely related, thus sharing their properties. Such a finding leads us to the culmination of our work, a fast spectral MaxExp which is a variant of HDP for covariances/autocorrelation matrices. We evaluate our ideas on fine-grained recognition, scene recognition, and material classification, as well as in few-shot learning and graph classification.

[1]  Limin Wang,et al.  Places205-VGGNet Models for Scene Recognition , 2015, ArXiv.

[2]  Peter J. Bentley,et al.  Capsule Neural Networks for Graph Classification using Explicit Tensorial Graph Representations , 2019, ArXiv.

[3]  J. Magnus On Differentiating Eigenvalues and Eigenvectors , 1985, Econometric Theory.

[4]  Edward H. Adelson,et al.  Material perception: What can you see in a brief glance? , 2010 .

[5]  Hervé Jégou,et al.  Negative Evidences and Co-occurences in Image Retrieval: The Benefit of PCA and Whitening , 2012, ECCV.

[6]  Subhransu Maji,et al.  Deep filter banks for texture recognition and segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Iasonas Kokkinos,et al.  Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Subhransu Maji,et al.  Improved Bilinear Pooling with CNNs , 2017, BMVC.

[9]  Joachim Denzler,et al.  The Whole Is More Than Its Parts? From Explicit to Implicit Pose Normalization , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Subhransu Maji,et al.  Second-order Democratic Aggregation , 2018, ECCV.

[11]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Fatih Murat Porikli,et al.  Domain Adaptation by Mixture of Alignments of Second-or Higher-Order Scatter Tensors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Paul A. Viola,et al.  Learning from one example through shared densities on transforms , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[14]  Soumava Kumar Roy,et al.  Geometry Aware Constrained Optimization Techniques for Deep Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Jing Zhang,et al.  Few-Shot Learning via Saliency-Guided Hallucination of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Philip A. Chou,et al.  Graph Signal Processing – A Probabilistic Framework , 2016 .

[17]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[18]  Fatih Murat Porikli,et al.  A Deeper Look at Power Normalizations , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Atsuto Maki,et al.  Factors of Transferability for a Generic ConvNet Representation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Matthieu Guillaumin,et al.  Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.

[21]  Fatih Murat Porikli,et al.  Region Covariance: A Fast Descriptor for Detection and Classification , 2006, ECCV.

[22]  R. Bhatia Positive Definite Matrices , 2007 .

[23]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[24]  C. S. Rudisill,et al.  Derivatives of Eigenvalues and Eigenvectors for a General Matrix , 1974 .

[25]  Hongguang Zhang,et al.  Power Normalizing Second-Order Similarity Network for Few-Shot Learning , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[26]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[27]  L. C. Rogers Derivatives of eigenvalues and eigenvectors , 1970 .

[28]  Fatih Murat Porikli,et al.  Pedestrian Detection via Classification on Riemannian Manifolds , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Antje Chang,et al.  BRENDA , the enzyme database : updates and major new developments , 2003 .

[32]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[33]  P. Perona,et al.  Rapid natural scene categorization in the near absence of attention , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[34]  I. Dryden,et al.  Non-Euclidean statistics for covariance matrices, with applications to diffusion tensor imaging , 2009, 0910.1656.

[35]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[36]  Janusz Konrad,et al.  Action Recognition From Video Using Feature Covariance Matrices , 2013, IEEE Transactions on Image Processing.

[37]  Diusion and random walks on graphs , 2015 .

[38]  N. Ayache,et al.  Log‐Euclidean metrics for fast and simple calculus on diffusion tensors , 2006, Magnetic resonance in medicine.

[39]  Bharath Hariharan,et al.  Few-Shot Learning With Localization in Realistic Settings , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Zhizhou Wang,et al.  An affine invariant tensor dissimilarity measure and its applications to tensor-valued image segmentation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[41]  Jan Eric Lenssen,et al.  Fast Graph Representation Learning with PyTorch Geometric , 2019, ArXiv.

[42]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[43]  Cristian Sminchisescu,et al.  Matrix Backpropagation for Deep Networks with Structured Layers , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[44]  Subhransu Maji,et al.  Bilinear Convolutional Neural Networks for Fine-Grained Visual Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[46]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[47]  Sergey Ivanov,et al.  Anonymous Walk Embeddings , 2018, ICML.

[48]  Xiao Liu,et al.  Kernel Pooling for Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Deyu Meng,et al.  Two-Stream Contextualized CNN for Fine-Grained Image Classification , 2016, AAAI.

[50]  Yi-Chang Lu,et al.  Deep Co-occurrence Feature Learning for Visual Object Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Xavier Pennec,et al.  A Riemannian Framework for Tensor Computing , 2005, International Journal of Computer Vision.

[52]  Krystian Mikolajczyk,et al.  Comparison of mid-level feature coding approaches and pooling strategies in visual concept detection , 2013, Comput. Vis. Image Underst..

[53]  Brian C. Lovell,et al.  Extrinsic Methods for Coding and Dictionary Learning on Grassmann Manifolds , 2014, International Journal of Computer Vision.

[54]  Luc Van Gool,et al.  A Riemannian Network for SPD Matrix Learning , 2016, AAAI.

[55]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[56]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[58]  Nozha Boujemaa,et al.  Generalized histogram intersection kernel for image recognition , 2005, IEEE International Conference on Image Processing 2005.

[59]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[60]  Zhi-Li Zhang,et al.  Graph Capsule Convolutional Neural Networks , 2018, ArXiv.

[61]  Rui Zhang,et al.  Museum Exhibit Identification Challenge for the Supervised Domain Adaptation and Beyond , 2018, ECCV.

[62]  Qi Tian,et al.  Towards Reversal-Invariant Image Representation , 2017, International Journal of Computer Vision.

[63]  Marcel Simon,et al.  Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[64]  Lei Wang,et al.  Few-Shot Object Detection by Second-Order Pooling , 2020, ACCV.

[65]  Mehrtash Harandi,et al.  Adaptive Subspaces for Few-Shot Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Anoop Cherian,et al.  Jensen-Bregman LogDet Divergence with Application to Efficient Similarity Search for Covariance Matrices , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[67]  Krystian Mikolajczyk,et al.  Spatial Coordinate Coding to reduce histogram representations, Dominant Angle and Colour Pyramid Match , 2011, 2011 18th IEEE International Conference on Image Processing.

[68]  Takayuki Okatani,et al.  Integrating deep features for material recognition , 2015, 2016 23rd International Conference on Pattern Recognition (ICPR).

[69]  Michael Fink,et al.  Object Classification from a Single Example Utilizing Class Relevance Metrics , 2004, NIPS.

[70]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[71]  K. Mikolajczyk,et al.  Higher-order Occurrence Pooling on Mid- and Low-level Features: Visual Concept Detection , 2013 .

[72]  Piotr Koniusz,et al.  Simple Spectral Graph Convolution , 2021, ICLR.

[73]  Qilong Wang,et al.  Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[74]  Anoop Cherian,et al.  Sparse Coding for Third-Order Super-Symmetric Tensor Descriptors with Application to Texture Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Yongxin Yang,et al.  Attribute-Enhanced Face Recognition with Neural Tensor Fusion Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[76]  Fatih Murat Porikli,et al.  Scene Categorization with Spectral Features , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[77]  Mehrtash Tafazzoli Harandi,et al.  Joint Dimensionality Reduction and Metric Learning: A Geometric Take , 2017, ICML.

[78]  Zhuowen Tu,et al.  Training Deeper Convolutional Networks with Deep Supervision , 2015, ArXiv.