Convolutional Normalization: Improving Deep Convolutional Network Robustness and Training

Normalization techniques have become a basic component in modern convolutional neural networks (ConvNets). In particular, many recent works demonstrate that promoting the orthogonality of the weights helps train deep models and improve robustness. For ConvNets, most existing methods are based on penalizing or normalizing weight matrices derived from concatenating or flattening the convolutional kernels. These methods often destroy or ignore the benign convolutional structure of the kernels; therefore, they are often expensive or impractical for deep ConvNets. In contrast, we introduce a simple and efficient “Convolutional Normalization” (ConvNorm) method that can fully exploit the convolutional structure in the Fourier domain and serve as a simple plug-and-play module to be conveniently incorporated into any ConvNets. Our method is inspired by recent work on preconditioning methods for convolutional sparse coding and can effectively promote each layer’s channel-wise isometry. Furthermore, we show that our ConvNorm can reduce the layerwise spectral norm of the weight matrices and hence improve the Lipschitzness of the network, leading to easier training and improved robustness for deep ConvNets. Applied to classification under noise corruptions and generative adversarial network (GAN), we show that the ConvNorm improves the robustness of common ConvNets such as ResNet and the performance of GAN. We verify our findings via numerical experiments on CIFAR and ImageNet. Our implementation is available online at https://github.com/shengliu66/ConvNorm.

[1]  J. Z. Kolter,et al.  Orthogonalizing Convolutional Layers with the Cayley Transform , 2021, ICLR.

[2]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Yann Chevaleyre,et al.  On Lipschitz Regularization of Convolutional Layers using Toeplitz Matrix Theory , 2020, AAAI.

[4]  Yonina C. Eldar,et al.  Algorithm Unrolling: Interpretable, Efficient Deep Learning for Signal and Image Processing , 2019, IEEE Signal Processing Magazine.

[5]  Dacheng Tao,et al.  Orthogonal Deep Neural Networks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Lei Huang,et al.  Normalization Techniques in Training DNNs: Methodology, Analysis and Application , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Sheng Liu,et al.  Early-Learning Regularization Prevents Memorization of Noisy Labels , 2020, NeurIPS.

[8]  Chong You,et al.  Deep Isometric Learning for Visual Recognition , 2020, ICML.

[9]  Yaron Lipman,et al.  Isometric Autoencoders , 2020, ArXiv.

[10]  Zhihui Zhu,et al.  Geometric Analysis of Nonconvex Optimization Landscapes for Overcomplete Learning , 2020, ICLR.

[11]  Ling Shao,et al.  Controllable Orthogonalization in Training DNNs , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jun Li,et al.  Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform , 2020, ICLR.

[13]  Zhihui Zhu,et al.  Finding the Sparsest Vectors in a Subspace: Theory, Algorithms, and Applications , 2020, ArXiv.

[14]  Jeffrey Pennington,et al.  Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks , 2020, ICLR.

[15]  Stella X. Yu,et al.  Orthogonal Convolutional Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Alexei A. Efros,et al.  Test-Time Training with Self-Supervision for Generalization under Distribution Shifts , 2019, ICML.

[17]  Pengcheng Zhou,et al.  Short-and-Sparse Deconvolution - A Geometric Approach , 2019, ICLR.

[18]  Hua He,et al.  Network Deconvolution , 2020, ICLR.

[19]  Zhiyuan Li,et al.  Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee , 2019, ICLR.

[20]  Gerard de Melo,et al.  OOGAN: Disentangling GAN with One-Hot Sampling and Orthogonal Regularization , 2019 .

[21]  Samet Oymak,et al.  Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks , 2019, AISTATS.

[22]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[23]  Guoqiang Zhang,et al.  Approximated Orthonormal Normalisation in Training Neural Networks , 2019, ArXiv.

[24]  Cem Anil,et al.  Preventing Gradient Attenuation in Lipschitz Constrained Convolutional Networks , 2019, NeurIPS.

[25]  Zhihui Zhu,et al.  A Nonconvex Approach for Exact and Efficient Multichannel Sparse Blind Deconvolution , 2019, NeurIPS.

[26]  Andrew Gordon Wilson,et al.  Simple Black-box Adversarial Attacks , 2019, ICML.

[27]  Larry S. Davis,et al.  Adversarial Training for Free! , 2019, NeurIPS.

[28]  Lantao Yu,et al.  Lipschitz Generative Adversarial Nets , 2019, ICML.

[29]  Mario Lezcano Casado,et al.  Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group , 2019, ICML.

[30]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[31]  Philip M. Long,et al.  The Singular Values of Convolutional Layers , 2018, ICLR.

[32]  Haifeng Qian,et al.  L2-Nonexpansive Neural Networks , 2018, ICLR.

[33]  Xiaohan Chen,et al.  Can We Gain More from Orthogonality Regularizations in Training Deep CNNs? , 2018, NeurIPS.

[34]  Jascha Sohl-Dickstein,et al.  Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks , 2018, ICML.

[35]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[36]  Surya Ganguli,et al.  The Emergence of Spectral Universality in Deep Networks , 2018, AISTATS.

[37]  Colin Raffel,et al.  Is Generator Conditioning Causally Related to GAN Performance? , 2018, ICML.

[38]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[39]  Jacob Abernethy,et al.  On Convergence and Stability of GANs , 2018 .

[40]  Mario Lucic,et al.  Are GANs Created Equal? A Large-Scale Study , 2017, NeurIPS.

[41]  Asja Fischer,et al.  On the regularization of Wasserstein GANs , 2017, ICLR.

[42]  Xianglong Liu,et al.  Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent Stiefel Manifolds in Deep Neural Networks , 2017, AAAI.

[43]  Qiang Ye,et al.  Orthogonal Recurrent Neural Networks with Scaled Cayley Transform , 2017, ICML.

[44]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[45]  Chen Kong,et al.  Take it in your stride: Do we need striding in CNNs? , 2017, ArXiv.

[46]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[47]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[48]  Moustapha Cissé,et al.  Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[49]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[50]  Wojciech Czarnecki,et al.  On Loss Functions for Deep Neural Networks in Classification , 2017, ArXiv.

[51]  Christopher Joseph Pal,et al.  On orthogonality and learning recurrent networks with long term dependencies , 2017, ICML.

[52]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Basura Fernando,et al.  Generalized BackPropagation, Étude De Cas: Orthogonality , 2016, ArXiv.

[56]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[57]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[58]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[59]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[60]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Yoshua Bengio,et al.  Unitary Evolution Recurrent Neural Networks , 2015, ICML.

[63]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[64]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[65]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[66]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[68]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[69]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[70]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[71]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[72]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[73]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[74]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[75]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[76]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[77]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.