论文信息 - Exploiting Excessive Invariance caused by Norm-Bounded Adversarial Robustness - 字舞流文

Exploiting Excessive Invariance caused by Norm-Bounded Adversarial Robustness

Adversarial examples are malicious inputs crafted to cause a model to misclassify them. Their most common instantiation, "perturbation-based" adversarial examples introduce changes to the input that leave its true label unchanged, yet result in a different model prediction. Conversely, "invariance-based" adversarial examples insert changes to the input that leave the model's prediction unaffected despite the underlying input's label having changed. In this paper, we demonstrate that robustness to perturbation-based adversarial examples is not only insufficient for general robustness, but worse, it can also increase vulnerability of the model to invariance-based adversarial examples. In addition to analytical constructions, we empirically study vision classifiers with state-of-the-art robustness to perturbation-based adversaries constrained by an $\ell_p$ norm. We mount attacks that exploit excessive model invariance in directions relevant to the task, which are able to find adversarial examples within the $\ell_p$ ball. In fact, we find that classifiers trained to be $\ell_p$-norm robust are more vulnerable to invariance-based adversarial examples than their undefended counterparts. Excessive invariance is not limited to models trained to be robust to perturbation-based $\ell_p$-norm adversaries. In fact, we argue that the term adversarial example is used to capture a series of model limitations, some of which may not have been discovered yet. Accordingly, we call for a set of precise definitions that taxonomize and address each of these shortcomings in learning.

Jörn-Henrik Jacobsen | Nicholas Carlini | Nicolas Papernot | Florian Tramèr | Jens Behrmann | Florian Tramèr | Nicolas Papernot | Nicholas Carlini | Jens Behrmann | J. Jacobsen

[1] Fabio Roli,et al. Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[2] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.

[3] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[4] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[5] Yoshua Bengio,et al. NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[6] David J. Fleet,et al. Adversarial Manipulation of Deep Representations , 2015, ICLR.

[7] Moustapha Cissé,et al. Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[8] Alexandros G. Dimakis,et al. The Robust Manifold Defense: Adversarial Training using Generative Models , 2017, ArXiv.

[9] Pascal Fernsel,et al. Analysis of Invariance and Robustness via Invertibility of ReLU-Networks , 2018, ArXiv.

[10] Arnold W. M. Smeulders,et al. i-RevNet: Deep Invertible Networks , 2018, ICLR.

[11] Dan Boneh,et al. Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[12] Ryan P. Adams,et al. Motivating the Rules of the Game for Adversarial Example Research , 2018, ArXiv.

[13] J. Zico Kolter,et al. Scaling provable adversarial defenses , 2018, NeurIPS.

[14] J. Zico Kolter,et al. Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[15] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[16] Aditi Raghunathan,et al. Certified Defenses against Adversarial Examples , 2018, ICLR.

[17] Martin Wattenberg,et al. Adversarial Spheres , 2018, ICLR.

[18] Jitendra Malik,et al. A Study of Robustness of Neural Nets Using Approximate Feature Collisions , 2018 .

[19] Rama Chellappa,et al. Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[20] Aleksander Madry,et al. Robustness May Be at Odds with Accuracy , 2018, ICLR.

[21] Suman Jana,et al. Certified Robustness to Adversarial Examples with Differential Privacy , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[22] Matthias Bethge,et al. Excessive Invariance Causes Adversarial Vulnerability , 2018, ICLR.

[23] David Duvenaud,et al. Invertible Residual Networks , 2018, ICML.

[24] Aleksander Madry,et al. On Evaluating Adversarial Robustness , 2019, ArXiv.

[25] J. Zico Kolter,et al. Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[26] Matthias Bethge,et al. Towards the first adversarially robust neural network model on MNIST , 2018, ICLR.