Towards Understanding the Regularization of Adversarial Robustness on Neural Networks

The problem of adversarial examples has shown that modern Neural Network (NN) models could be rather fragile. Among the more established techniques to solve the problem, one is to require the model to be {\it $\epsilon$-adversarially robust} (AR); that is, to require the model not to change predicted labels when any given input examples are perturbed within a certain range. However, it is observed that such methods would lead to standard performance degradation, i.e., the degradation on natural examples. In this work, we study the degradation through the regularization perspective. We identify quantities from generalization analysis of NNs; with the identified quantities we empirically find that AR is achieved by regularizing/biasing NNs towards less confident solutions by making the changes in the feature space (induced by changes in the instance space) of most layers smoother uniformly in all directions; so to a certain extent, it prevents sudden change in prediction w.r.t. perturbations. However, the end result of such smoothing concentrates samples around decision boundaries, resulting in less confident solutions, and leads to worse standard performance. Our studies suggest that one might consider ways that build AR into NNs in a gentler way to avoid the problematic regularization.

[1]  Alan L. Yuille,et al.  Feature Denoising for Improving Adversarial Robustness , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[3]  Philip M. Long,et al.  The Singular Values of Convolutional Layers , 2018, ICLR.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  F W Pfeiffer,et al.  Automatic differentiation in prose , 1987, SGNM.

[6]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[7]  Dacheng Tao,et al.  Orthogonal Deep Neural Networks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[9]  Yann LeCun,et al.  Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.

[10]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[11]  Yishay Mansour,et al.  Improved generalization bounds for robust learning , 2018, ALT.

[12]  Masashi Sugiyama,et al.  Lipschitz-Margin Training: Scalable Certification of Perturbation Invariance for Deep Neural Networks , 2018, NeurIPS.

[13]  Jinfeng Yi,et al.  Is Robustness the Cost of Accuracy? - A Comprehensive Study on the Robustness of 18 Deep Image Classification Models , 2018, ECCV.

[14]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[15]  Harini Kannan,et al.  Adversarial Logit Pairing , 2018, NIPS 2018.

[16]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[18]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Robustness via Curvature Regularization, and Vice Versa , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Prateek Mittal,et al.  PAC-learning in the presence of evasion adversaries , 2018, NIPS 2018.

[20]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[21]  John C. Duchi,et al.  Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[22]  Gary Bécigneul,et al.  On the effect of pooling on the geometry of representations , 2017, ArXiv.

[23]  Po-Ling Loh,et al.  Adversarial Risk Bounds for Binary Classification via Function Transformation , 2018, ArXiv.

[24]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[25]  J. Zico Kolter,et al.  Uniform convergence may be unable to explain generalization in deep learning , 2019, NeurIPS.

[26]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[27]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[28]  Aleksander Madry,et al.  Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.

[29]  Kaizhu Huang,et al.  A Unified Gradient Regularization Family for Adversarial Examples , 2015, 2015 IEEE International Conference on Data Mining.

[30]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[31]  Pascal Frossard,et al.  Analysis of classifiers’ robustness to adversarial perturbations , 2015, Machine Learning.

[32]  David A. McAllester,et al.  A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.

[33]  Guillermo Sapiro,et al.  Robust Large Margin Deep Neural Networks , 2016, IEEE Transactions on Signal Processing.

[34]  Pushmeet Kohli,et al.  Adversarial Robustness through Local Linearization , 2019, NeurIPS.

[35]  Jianyu Wang,et al.  Bilateral Adversarial Training: Towards Fast Training of More Robust Models Against Adversarial Attacks , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[37]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[38]  Shie Mannor,et al.  Robustness and generalization , 2010, Machine Learning.

[39]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[40]  Yu-Hsien Peng On Singular Values of Random Matrices , 2015 .

[41]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[42]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[43]  Tengyu Ma,et al.  Fixup Initialization: Residual Learning Without Normalization , 2019, ICLR.

[44]  Cho-Jui Hsieh,et al.  Optimal Transport Classifier: Defending Against Adversarial Attacks by Regularized Deep Embedding , 2018, ArXiv.

[45]  Nakul Verma,et al.  Distance Preserving Embeddings for General n-Dimensional Manifolds , 2012, COLT.