On Adversarial Risk and Training

In this work we formally define the notions of adversarial perturbations, adversarial risk and adversarial training and analyze their properties. Our analysis provides several interesting insights into adversarial risk, adversarial training, and their relation to the classification risk, "traditional" training. We also show that adversarial training can result in models with better classification accuracy and can result in better explainable models than traditional training. Although adversarial training is computationally expensive, our results and insights suggest that one should prefer adversarial training over traditional risk minimization for learning complex models from data.

[1]  Jascha Sohl-Dickstein,et al.  Adversarial Examples that Fool both Human and Computer Vision , 2018, ArXiv.

[2]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.

[3]  Lujo Bauer,et al.  On the Suitability of Lp-Norms for Creating and Preventing Adversarial Examples , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[4]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[5]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[6]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[7]  Alexandros G. Dimakis,et al.  The Robust Manifold Defense: Adversarial Training using Generative Models , 2017, ArXiv.

[8]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[9]  Logan Engstrom,et al.  Synthesizing Robust Adversarial Examples , 2017, ICML.

[10]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[11]  Harris Drucker,et al.  Improving generalization performance using double backpropagation , 1992, IEEE Trans. Neural Networks.

[12]  Markus H. Gross,et al.  A unified view of gradient-based attribution methods for Deep Neural Networks , 2017, NIPS 2017.

[13]  Aleksander Madry,et al.  Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.

[14]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[15]  Shie Mannor,et al.  Robustness and Regularization of Support Vector Machines , 2008, J. Mach. Learn. Res..

[16]  Ilya P. Razenshteyn,et al.  Adversarial examples from computational constraints , 2018, ICML.

[17]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[18]  Pascal Frossard,et al.  Analysis of classifiers’ robustness to adversarial perturbations , 2015, Machine Learning.

[19]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[20]  Omar Fawzi,et al.  Robustness of classifiers to uniform $\ell_p$ and Gaussian noise , 2018, AISTATS.

[21]  John C. Duchi,et al.  Certifiable Distributional Robustness with Principled Adversarial Training , 2017, ArXiv.

[22]  Hamza Fawzi,et al.  Adversarial vulnerability for any classifier , 2018, NeurIPS.

[23]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[24]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[25]  Masashi Sugiyama,et al.  Lipschitz-Margin Training: Scalable Certification of Perturbation Invariance for Deep Neural Networks , 2018, NeurIPS.