Lipschitz regularized Deep Neural Networks generalize and are adversarially robust

In this work we study input gradient regularization of deep neural networks, and demonstrate that such regularization leads to generalization proofs and improved adversarial robustness. The proof of generalization does not overcome the curse of dimensionality, but it is independent of the number of layers in the networks. The adversarial robustness regularization combines adversarial training, which we show to be equivalent to Total Variation regularization, with Lipschitz regularization. We demonstrate empirically that the regularized models are more robust, and that gradient norms of images can be used for attack detection.

[1]  G. M.,et al.  Partial Differential Equations I , 2023, Applied Mathematical Sciences.

[2]  E. J. McShane,et al.  Extension of range of functions , 1934 .

[3]  W. Rudin Principles of mathematical analysis , 1964 .

[4]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[5]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[6]  L. Evans Measure theory and fine properties of functions , 1992 .

[7]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[8]  Harris Drucker,et al.  Improving generalization performance using double backpropagation , 1992, IEEE Trans. Neural Networks.

[9]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[10]  Peter L. Bartlett,et al.  For Valid Generalization the Size of the Weights is More Important than the Size of the Network , 1996, NIPS.

[11]  Guillermo Sapiro,et al.  Image inpainting , 2000, SIGGRAPH.

[12]  Andrea Braides Gamma-Convergence for Beginners , 2002 .

[13]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[14]  Pierre Kornprobst,et al.  Mathematical problems in image processing - partial differential equations and the calculus of variations , 2010, Applied mathematical sciences.

[15]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[16]  M. Crandall,et al.  A TOUR OF THE THEORY OF ABSOLUTELY MINIMIZING FUNCTIONS , 2004 .

[17]  M. Talagrand The Generic chaining : upper and lower bounds of stochastic processes , 2005 .

[18]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[19]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[20]  Luminita A. Vese,et al.  An image decomposition model using the total variation and the infinity Laplacian , 2007, Electronic Imaging.

[21]  A. Iserles A First Course in the Numerical Analysis of Differential Equations: Gaussian elimination for sparse linear equations , 2008 .

[22]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[23]  Carole Le Guyader,et al.  Extrapolation of Vector Fields Using the Infinity Laplacian and with Applications to Image Segmentation , 2009, SSVM.

[24]  Daniel Cremers,et al.  Global Solutions of Variational Models with Convex Regularization , 2010, SIAM J. Imaging Sci..

[25]  Shie Mannor,et al.  Robustness and generalization , 2010, Machine Learning.

[26]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[27]  Daniel Lowd,et al.  Convex Adversarial Collective Classification , 2013, ICML.

[28]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[29]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[30]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[31]  Daniel A. Spielman,et al.  Algorithms for Lipschitz Learning on Graphs , 2015, COLT.

[32]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[33]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[34]  Behrouz A. Forouzan,et al.  Cryptography and network security , 1998 .

[35]  Yoram Singer,et al.  Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[36]  Antje Baer,et al.  Direct Methods In The Calculus Of Variations , 2016 .

[37]  Ahmed El Alaoui,et al.  Asymptotic behavior of \(\ell_p\)-based Laplacian regularization in semi-supervised learning , 2016, COLT.

[38]  Moustapha Cissé,et al.  Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[39]  Matthias Hein,et al.  Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation , 2017, NIPS.

[40]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[41]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[42]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[43]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[44]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[45]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[46]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[47]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[48]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[49]  Yuichi Yoshida,et al.  Spectral Norm Regularization for Improving the Generalizability of Deep Learning , 2017, ArXiv.

[50]  Guillermo Sapiro,et al.  Robust Large Margin Deep Neural Networks , 2016, IEEE Transactions on Signal Processing.

[51]  Masashi Sugiyama,et al.  Lipschitz-Margin Training: Scalable Certification of Perturbation Invariance for Deep Neural Networks , 2018, NeurIPS.

[52]  Jeff Calder,et al.  The game theoretic p-Laplacian and semi-supervised learning with few labels , 2017, Nonlinearity.

[53]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[54]  Matthias Bethge,et al.  Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[55]  Jinfeng Yi,et al.  Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach , 2018, ICLR.

[56]  Richard Nock,et al.  Lipschitz Networks and Distributional Robustness , 2018, ArXiv.

[57]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[58]  Xiang Wei,et al.  Improving the Improved Training of Wasserstein GANs: A Consistency Term and Its Dual Effect , 2018, ICLR.

[59]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[60]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[61]  Adam M. Oberman,et al.  Improved robustness to adversarial examples using Lipschitz regularization of the loss , 2018, ArXiv.

[62]  Tomaso A. Poggio,et al.  A Surprising Linear Relationship Predicts Test Performance in Deep Networks , 2018, ArXiv.

[63]  Patrick D. McDaniel,et al.  Making machine learning robust against adversarial inputs , 2018, Commun. ACM.

[64]  Harini Kannan,et al.  Adversarial Logit Pairing , 2018, NIPS 2018.

[65]  Dejan Slepcev,et al.  Analysis of $p$-Laplacian Regularization in Semi-Supervised Learning , 2017, SIAM J. Math. Anal..

[66]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[67]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Jeff Calder,et al.  Consistency of Lipschitz learning with infinite unlabeled data and finite labeled data , 2017, SIAM J. Math. Data Sci..

[69]  Cem Anil,et al.  Sorting out Lipschitz function approximation , 2018, ICML.

[70]  Haifeng Qian,et al.  L2-Nonexpansive Neural Networks , 2018, ICLR.

[71]  Aleksander Madry,et al.  Training for Faster Adversarial Robustness Verification via Inducing ReLU Stability , 2018, ICLR.