论文信息 - Increasing Confidence in Adversarial Robustness Evaluations - 字舞流文

Increasing Confidence in Adversarial Robustness Evaluations

Hundreds of defenses have been proposed to make deep neural networks robust against minimal (adversarial) input perturbations. However, only a handful of these defenses held up their claims because correctly evaluating robustness is extremely challenging: Weak attacks often fail to ﬁnd adversarial examples even if they unknowingly exist, thereby making a vulnerable network look robust. In this paper, we propose a test to identify weak attacks, and thus weak defense evaluations. Our test slightly modiﬁes a neural network to guarantee the existence of an adversarial example for every sample. Consequentially, any correct attack must succeed in breaking this modiﬁed network. For eleven out of thirteen previously-published defenses, the original evaluation of the defense fails our test, while stronger attacks that break these defenses pass it. We hope that attack unit tests — such as ours — will be a major component in future robustness evaluations and increase conﬁdence in an empirical ﬁeld that is currently riddled with skepticism. Online version & code: zimmerrol.github.io/active-tests/

Roland S. Zimmermann | Florian Tramèr | Nicholas Carlini | Wieland Brendel

[1] Sven Gowal,et al. Data Augmentation Can Improve Robustness , 2021, NeurIPS.

[2] Anindya Sarkar,et al. Get Fooled for the Right Reason: Improving Adversarial Robustness through a Teacher-guided Curriculum Learning Approach , 2021, ArXiv.

[3] Sven Gowal,et al. Improving Robustness using Generated Data , 2021, NeurIPS.

[4] Nicholas Carlini,et al. Evading Adversarial Example Detection Defenses with Orthogonal Projected Gradient Descent , 2021, ICLR.

[5] Nicholas Carlini,et al. Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples , 2021, NeurIPS.

[6] Seyed-Mohsen Moosavi-Dezfooli,et al. Reducing Excessive Margin to Achieve a Better Accuracy vs. Robustness Trade-off , 2022, ICLR.

[7] Balaraman Ravindran,et al. EMPIR: Ensembles of Mixed Precision Deep Networks for Increased Robustness against Adversarial Attacks , 2020, ICLR.

[8] Matthias Hein,et al. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , 2020, ICML.

[9] Florian Tramèr,et al. On Adaptive Attacks to Adversarial Example Defenses , 2020, NeurIPS.

[10] Nicolas Flammarion,et al. Square Attack: a query-efficient black-box adversarial attack via random search , 2019, ECCV.

[11] Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks , 2019, ICLR.

[12] Matthias Hein,et al. Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack , 2019, ICML.

[13] Michael I. Jordan,et al. ML-LOO: Detecting Adversarial Examples with Feature Attribution , 2019, AAAI.

[14] Changxi Zheng,et al. Enhancing Adversarial Defense by k-Winners-Take-All , 2019, ICLR.

[15] Ning Chen,et al. Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness , 2019, ICLR.

[16] Ben Y. Zhao,et al. Gotta Catch'Em All: Using Honeypots to Catch Adversarial Attacks on Neural Networks , 2019, CCS.

[17] Wei Xu,et al. Adversarial Interpolation Training: A Simple Approach for Improving Model Robustness , 2019 .

[18] Haichao Zhang,et al. Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training , 2019, NeurIPS.

[19] Matthias Bethge,et al. Accurate, reliable and fast robustness evaluation , 2019, NeurIPS.

[20] Chawin Sitawarin,et al. Defending Against Adversarial Examples with K-Nearest Neighbor , 2019, ArXiv.

[21] Ling Shao,et al. Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22] Aleksander Madry,et al. On Evaluating Adversarial Robustness , 2019, ArXiv.

[23] Thomas Hofmann,et al. The Odds are Odd: A Statistical Test for Detecting Adversarial Examples , 2019, ICML.

[24] J. Zico Kolter,et al. Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[25] Mani Srivastava,et al. GenAttack: practical black-box attacks with gradient-free optimization , 2018, GECCO.

[26] Matthias Bethge,et al. Towards the first adversarially robust neural network model on MNIST , 2018, ICLR.

[27] Suman Jana,et al. Certified Robustness to Adversarial Examples with Differential Privacy , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[28] Ananthram Swami,et al. Error Correcting Output Codes Improve Probability Estimation and Adversarial Robustness of Deep Neural Networks , 2019, NeurIPS.

[29] Logan Engstrom,et al. Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[30] Colin Raffel,et al. Thermometer Encoding: One Hot Way To Resist Adversarial Examples , 2018, ICLR.

[31] David A. Wagner,et al. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[32] Matthias Bethge,et al. Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[33] J. Zico Kolter,et al. Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[34] Yang Song,et al. PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[35] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[36] Moustapha Cissé,et al. Countering Adversarial Images using Input Transformations , 2018, ICLR.

[37] Nina Narodytska,et al. Simple Black-Box Adversarial Attacks on Deep Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[38] David Wagner,et al. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[39] Jan Hendrik Metzen,et al. On Detecting Adversarial Perturbations , 2017, ICLR.

[40] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[41] David J. Fleet,et al. Adversarial Manipulation of Deep Representations , 2015, ICLR.

[42] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[43] Fabio Roli,et al. Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[44] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .