Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations

Adversarial examples are malicious inputs crafted to induce misclassification. Commonly studied sensitivity-based adversarial examples introduce semantically-small changes to an input that result in a different model prediction. This paper studies a complementary failure mode, invariance-based adversarial examples, that introduce minimal semantic changes that modify an input's true label yet preserve the model's prediction. We demonstrate fundamental tradeoffs between these two types of adversarial examples. We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks, and that new approaches are needed to resist both attack types. In particular, we break state-of-the-art adversarially-trained and certifiably-robust models by generating small perturbations that the models are (provably) robust to, yet that change an input's class according to human labelers. Finally, we formally show that the existence of excessively invariant classifiers arises from the presence of overly-robust predictive features in standard datasets.

[1]  Alexandros G. Dimakis,et al.  The Robust Manifold Defense: Adversarial Training using Generative Models , 2017, ArXiv.

[2]  Dan Boneh,et al.  AdVersarial: Perceptual Ad Blocking meets Adversarial Machine Learning , 2019, CCS.

[3]  Lujo Bauer,et al.  On the Suitability of Lp-Norms for Creating and Preventing Adversarial Examples , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[4]  Matthias Bethge,et al.  Towards the first adversarially robust neural network model on MNIST , 2018, ICLR.

[5]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[6]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[7]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[8]  Dan Boneh,et al.  Adversarial Training and Robustness for Multiple Perturbations , 2019, NeurIPS.

[9]  Cho-Jui Hsieh,et al.  Towards Stable and Efficient Training of Verifiably Robust Neural Networks , 2019, ICLR.

[10]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[11]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[12]  Aleksander Madry,et al.  Exploring the Landscape of Spatial Robustness , 2017, ICML.

[13]  Yoshua Bengio,et al.  Measuring the tendency of CNNs to Learn Surface Statistical Regularities , 2017, ArXiv.

[14]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[15]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.

[16]  Matthias Bethge,et al.  Excessive Invariance Causes Adversarial Vulnerability , 2018, ICLR.

[17]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[18]  David J. Fleet,et al.  Adversarial Manipulation of Deep Representations , 2015, ICLR.

[19]  Ryan P. Adams,et al.  Motivating the Rules of the Game for Adversarial Example Research , 2018, ArXiv.

[20]  Mitali Bafna,et al.  Thwarting Adversarial Examples: An L_0-Robust Sparse Fourier Transform , 2018, NeurIPS.

[21]  Aleksander Madry,et al.  Adversarial Robustness as a Prior for Learned Representations , 2019 .

[22]  Kaushik Roy,et al.  Discretization Based Solutions for Secure Machine Learning Against Adversarial Attacks , 2019, IEEE Access.

[23]  Mohammad Hossein Rohban,et al.  Towards Deep Learning Models Resistant to Large Perturbations , 2020, ArXiv.

[24]  Yi Sun,et al.  Testing Robustness Against Unforeseen Adversaries , 2019, ArXiv.

[25]  Kenneth T. Co,et al.  Procedural Noise Adversarial Examples for Black-Box Attacks on Deep Convolutional Networks , 2018, CCS.

[26]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[27]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[28]  Ekin D. Cubuk,et al.  A Fourier Perspective on Model Robustness in Computer Vision , 2019, NeurIPS.

[29]  Eduard Hovy,et al.  Learning the Difference that Makes a Difference with Counterfactually-Augmented Data , 2020, ICLR.