What Deep CNNs Benefit From Global Covariance Pooling: An Optimization Perspective

Recent works have demonstrated that global covariance pooling (GCP) has the ability to improve performance of deep convolutional neural networks (CNNs) on visual classification task. Despite considerable advance, the reasons on effectiveness of GCP on deep CNNs have not been well studied. In this paper, we make an attempt to understand what deep CNNs benefit from GCP in a viewpoint of optimization. Specifically, we explore the effect of GCP on deep CNNs in terms of the Lipschitzness of optimization loss and the predictiveness of gradients, and show that GCP can make the optimization landscape more smooth and the gradients more predictive. Furthermore, we discuss the connection between GCP and second-order optimization for deep CNNs. More importantly, above findings can account for several merits of covariance pooling for training deep CNNs that have not been recognized previously or fully explored, including significant acceleration of network convergence (i.e., the networks trained with GCP can support rapid decay of learning rates, achieving favorable performance while significantly reducing number of training epochs), stronger robustness to distorted examples generated by image corruptions and perturbations, and good generalization ability to different vision tasks, e.g., object detection and instance segmentation. We conduct extensive experiments using various deep CNN architectures on diversified tasks, and the results provide strong support to our findings.

[1]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[2]  Cristian Sminchisescu,et al.  Training Deep Networks with Structured Layers by Matrix Backpropagation , 2015, ArXiv.

[3]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[4]  Cristian Sminchisescu,et al.  Matrix Backpropagation for Deep Networks with Structured Layers , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[6]  Carla P. Gomes,et al.  Understanding Batch Normalization , 2018, NeurIPS.

[7]  Lei Zhang,et al.  G2DeNet: Global Gaussian Distribution Embedding Network and Its Application to Visual Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[9]  Fei Xiong,et al.  MoNet: Moments Embedding Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[11]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[14]  Roger B. Grosse,et al.  Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.

[15]  Aleksander Madry,et al.  How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.

[16]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Cristian Sminchisescu,et al.  Free-Form Region Description with Second-Order Pooling , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Satoshi Matsuoka,et al.  Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Qilong Wang,et al.  Is Second-Order Information Helpful for Large-Scale Visual Recognition? , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Joe Yue-Hei Ng,et al.  FASON: First and Second Order Information Fusion Network for Texture Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Lei Zhang,et al.  Higher-Order Integration of Hierarchical Convolutional Activations for Fine-Grained Visual Categorization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[24]  Shu Kong,et al.  Low-Rank Bilinear Pooling for Fine-Grained Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jan Hendrik Metzen,et al.  On Detecting Adversarial Perturbations , 2017, ICLR.

[26]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[27]  Naman Agarwal,et al.  Second-Order Stochastic Optimization for Machine Learning in Linear Time , 2016, J. Mach. Learn. Res..

[28]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Trevor Darrell,et al.  Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Mathieu Salzmann,et al.  Statistically Motivated Second Order Pooling , 2018, ECCV.

[32]  David L. Dill,et al.  Ground-Truth Adversarial Examples , 2017, ArXiv.

[33]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Kai Chen,et al.  MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.

[35]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[36]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[37]  Subhransu Maji,et al.  Improved Bilinear Pooling with CNNs , 2017, BMVC.

[38]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[39]  Yang Gao,et al.  Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Qilong Wang,et al.  Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Xiao Liu,et al.  Kernel Pooling for Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Bolei Zhou,et al.  Interpreting Deep Visual Representations via Network Dissection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Moustapha Cissé,et al.  Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[44]  Luc Van Gool,et al.  Deep Temporal Linear Encoding Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.