Preventing Gradient Attenuation in Lipschitz Constrained Convolutional Networks

Lipschitz constraints under L2 norm on deep neural networks are useful for provable adversarial robustness bounds, stable training, and Wasserstein distance estimation. While heuristic approaches such as the gradient penalty have seen much practical success, it is challenging to achieve similar practical performance while provably enforcing a Lipschitz constraint. In principle, one can design Lipschitz constrained architectures using the composition property of Lipschitz functions, but Anil et al. recently identified a key obstacle to this approach: gradient norm attenuation. They showed how to circumvent this problem in the case of fully connected networks by designing each layer to be gradient norm preserving. We extend their approach to train scalable, expressive, provably Lipschitz convolutional networks. In particular, we present the Block Convolution Orthogonal Parameterization (BCOP), an expressive parameterization of orthogonal convolution operations. We show that even though the space of orthogonal convolutions is disconnected, the largest connected component of BCOP with 2n channels can represent arbitrary BCOP convolutions over n channels. Our BCOP parameterization allows us to train large convolutional networks with provable Lipschitz bounds. Empirically, we find that it is competitive with existing approaches to provable adversarial robustness and Wasserstein distance estimation.

[1]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[2]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[3]  Arnold W. M. Smeulders,et al.  i-RevNet: Deep Invertible Networks , 2018, ICLR.

[4]  J. Zico Kolter,et al.  Scaling provable adversarial defenses , 2018, NeurIPS.

[5]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[6]  Ritu Chadha,et al.  Limitations of the Lipschitz constant as a defense against adversarial examples , 2018, Nemesis/UrbReas/SoGood/IWAISe/GDM@PKDD/ECML.

[7]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[8]  Jascha Sohl-Dickstein,et al.  Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks , 2018, ICML.

[9]  Francesco Croce,et al.  Provable robustness against all adversarial $l_p$-perturbations for $p\geq 1$ , 2019, ICLR.

[10]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[11]  Cem Anil,et al.  Sorting out Lipschitz function approximation , 2018, ICML.

[12]  Mario Lezcano Casado,et al.  Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group , 2019, ICML.

[13]  Matthias Hein,et al.  Provable Robustness of ReLU networks via Maximization of Linear Regions , 2018, AISTATS.

[14]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[15]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[16]  Zeynep Akata,et al.  Primal-Dual Wasserstein GAN , 2018, ArXiv.

[17]  Moustapha Cissé,et al.  Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[18]  Å. Björck,et al.  An Iterative Algorithm for Computing the Best Estimate of an Orthogonal Matrix , 1971 .

[19]  Artem N. Chernodub,et al.  Norm-preserving Orthogonal Permutation Linear Unit Activation Functions (OPLU) , 2016, ArXiv.

[20]  W. Brendel,et al.  Foolbox: A Python toolbox to benchmark the robustness of machine learning models , 2017 .

[21]  Nicolas Pinto,et al.  Beyond simple features: A large-scale feature search approach to unconstrained face recognition , 2011, Face and Gesture 2011.

[22]  Todd Huster,et al.  Universal Lipschitz Approximation in Bounded Depth Neural Networks , 2019, ArXiv.

[23]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[26]  Philip M. Long,et al.  The Singular Values of Convolutional Layers , 2018, ICLR.

[27]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[28]  Aleksander Madry,et al.  Training for Faster Adversarial Robustness Verification via Inducing ReLU Stability , 2018, ICLR.

[29]  Yuichi Yoshida,et al.  Spectral Norm Regularization for Improving the Generalizability of Deep Learning , 2017, ArXiv.

[30]  Matthias Bethge,et al.  Towards the first adversarially robust neural network model on MNIST , 2018, ICLR.

[31]  Huichen Lihuichen DECISION-BASED ADVERSARIAL ATTACKS: RELIABLE ATTACKS AGAINST BLACK-BOX MACHINE LEARNING MODELS , 2017 .

[32]  Chih-Hong Cheng,et al.  Maximum Resilience of Artificial Neural Networks , 2017, ATVA.

[33]  Bernhard Pfahringer,et al.  Regularisation of neural networks by enforcing Lipschitz continuity , 2018, Machine Learning.

[34]  Surya Ganguli,et al.  Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice , 2017, NIPS.

[35]  Greg Yang,et al.  Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers , 2019, NeurIPS.

[36]  Aleksander Madry,et al.  On Evaluating Adversarial Robustness , 2019, ArXiv.

[37]  Matthias Bethge,et al.  Foolbox v0.8.0: A Python toolbox to benchmark the robustness of machine learning models , 2017, ArXiv.

[38]  Michael Unser,et al.  Hessian Schatten-Norm Regularization for Linear Inverse Problems , 2012, IEEE Transactions on Image Processing.

[39]  Matthias Hein,et al.  Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation , 2017, NIPS.

[40]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[41]  Guillermo Sapiro,et al.  Robust Large Margin Deep Neural Networks , 2017, IEEE Transactions on Signal Processing.

[42]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[43]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[44]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[45]  Matthias Hein,et al.  Provable robustness against all adversarial lp-perturbations for p≥1 , 2019, ArXiv.

[46]  Masashi Sugiyama,et al.  Lipschitz-Margin Training: Scalable Certification of Perturbation Invariance for Deep Neural Networks , 2018, NeurIPS.

[47]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[48]  Surya Ganguli,et al.  The Emergence of Spectral Universality in Deep Networks , 2018, AISTATS.

[49]  Haifeng Qian,et al.  L2-Nonexpansive Neural Networks , 2018, ICLR.

[50]  J. Kautsky,et al.  A Matrix Approach to Discrete Wavelets , 1994 .