On Regularization and Robustness of Deep Neural Networks

In this work, we study the connection between regularization and robustness of deep neural networks by viewing them as elements of a reproducing kernel Hilbert space (RKHS) of functions and by regularizing them using the RKHS norm. Even though this norm cannot be computed, we consider various approximations based on upper and lower bounds. These approximations lead to new strategies for regularization, but also to existing ones such as spectral norm penalties or constraints, gradient penalties, or adversarial training. Besides, the kernel framework allows us to obtain margin-based bounds on adversarial generalization. We show that our new algorithms lead to empirical benefits for learning on small datasets and learning adversarially robust models. We also discuss implications of our regularization framework for learning implicit generative models.

[1]  Julien Mairal,et al.  Group Invariance, Stability to Deformations, and Complexity of Deep Convolutional Representations , 2017, J. Mach. Learn. Res..

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2017, Pattern Recognit..

[4]  Yuchen Zhang,et al.  L1-regularized Neural Networks are Improperly Learnable in Polynomial Time , 2015, ICML.

[5]  Y. Le Cun,et al.  Double backpropagation increasing generalization performance , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[6]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[7]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[8]  Arthur Gretton,et al.  On gradient regularizers for MMD GANs , 2018, NeurIPS.

[9]  Yuichi Yoshida,et al.  Spectral Norm Regularization for Improving the Generalizability of Deep Learning , 2017, ArXiv.

[10]  Aleksander Madry,et al.  A Rotation and a Translation Suffice: Fooling CNNs with Simple Transformations , 2017, ArXiv.

[11]  Gert R. G. Lanckriet,et al.  On the empirical estimation of integral probability metrics , 2012 .

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Sebastian Nowozin,et al.  Adversarially Robust Training through Structured Gradient Regularization , 2018, ArXiv.

[14]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[15]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[16]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[17]  Ambuj Tewari,et al.  On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.

[18]  Shie Mannor,et al.  Robust Regression and Lasso , 2008, IEEE Transactions on Information Theory.

[19]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.

[20]  Arthur Gretton,et al.  Demystifying MMD GANs , 2018, ICLR.

[21]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[22]  John C. Duchi,et al.  Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[23]  Martin J. Wainwright,et al.  Convexified Convolutional Neural Networks , 2016, ICML.

[24]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[25]  David A. Wagner,et al.  Audio Adversarial Examples: Targeted Attacks on Speech-to-Text , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[26]  Julien Mairal,et al.  End-to-End Kernel Learning with Supervised Convolutional Kernel Networks , 2016, NIPS.

[27]  Aleksander Madry,et al.  Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.

[28]  Kaizhu Huang,et al.  A Unified Gradient Regularization Family for Adversarial Examples , 2015, 2015 IEEE International Conference on Data Mining.

[29]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Sergey Zagoruyko,et al.  Scaling the Scattering Transform: Deep Hybrid Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Aleksander Madry,et al.  There Is No Free Lunch In Adversarial Robustness (But There Are Unexpected Benefits) , 2018, ArXiv.

[32]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[33]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[34]  David A. McAllester,et al.  A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.

[35]  Moustapha Cissé,et al.  Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[36]  Sebastian Thrun,et al.  Lifelong Learning Algorithms , 1998, Learning to Learn.

[37]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[38]  Sebastian Nowozin,et al.  Stabilizing Training of Generative Adversarial Networks through Regularization , 2017, NIPS.

[39]  Zoubin Ghahramani,et al.  Training generative neural networks via Maximum Mean Discrepancy optimization , 2015, UAI.

[40]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[41]  Yiming Yang,et al.  MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.

[42]  Yann LeCun,et al.  Transformation Invariance in Pattern Recognition-Tangent Distance and Tangent Propagation , 1996, Neural Networks: Tricks of the Trade.

[43]  Shie Mannor,et al.  Robustness and Regularization of Support Vector Machines , 2008, J. Mach. Learn. Res..

[44]  Philip M. Long,et al.  The Singular Values of Convolutional Layers , 2018, ICLR.

[45]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[46]  Bernhard Schölkopf,et al.  Adversarial Vulnerability of Neural Networks Increases With Input Dimension , 2018, ArXiv.