Increasing Confidence in Adversarial Robustness Evaluations

Hundreds of defenses have been proposed to make deep neural networks robust against minimal (adversarial) input perturbations. However, only a handful of these defenses held up their claims because correctly evaluating robustness is extremely challenging: Weak attacks often fail to find adversarial examples even if they unknowingly exist, thereby making a vulnerable network look robust. In this paper, we propose a test to identify weak attacks, and thus weak defense evaluations. Our test slightly modifies a neural network to guarantee the existence of an adversarial example for every sample. Consequentially, any correct attack must succeed in breaking this modified network. For eleven out of thirteen previously-published defenses, the original evaluation of the defense fails our test, while stronger attacks that break these defenses pass it. We hope that attack unit tests — such as ours — will be a major component in future robustness evaluations and increase confidence in an empirical field that is currently riddled with skepticism. Online version & code: zimmerrol.github.io/active-tests/

[1]  Sven Gowal,et al.  Data Augmentation Can Improve Robustness , 2021, NeurIPS.

[2]  Anindya Sarkar,et al.  Get Fooled for the Right Reason: Improving Adversarial Robustness through a Teacher-guided Curriculum Learning Approach , 2021, ArXiv.

[3]  Sven Gowal,et al.  Improving Robustness using Generated Data , 2021, NeurIPS.

[4]  Nicholas Carlini,et al.  Evading Adversarial Example Detection Defenses with Orthogonal Projected Gradient Descent , 2021, ICLR.

[5]  Nicholas Carlini,et al.  Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples , 2021, NeurIPS.

[6]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Reducing Excessive Margin to Achieve a Better Accuracy vs. Robustness Trade-off , 2022, ICLR.

[7]  Balaraman Ravindran,et al.  EMPIR: Ensembles of Mixed Precision Deep Networks for Increased Robustness against Adversarial Attacks , 2020, ICLR.

[8]  Matthias Hein,et al.  Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , 2020, ICML.

[9]  Florian Tramèr,et al.  On Adaptive Attacks to Adversarial Example Defenses , 2020, NeurIPS.

[10]  Nicolas Flammarion,et al.  Square Attack: a query-efficient black-box adversarial attack via random search , 2019, ECCV.

[11]  Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks , 2019, ICLR.

[12]  Matthias Hein,et al.  Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack , 2019, ICML.

[13]  Michael I. Jordan,et al.  ML-LOO: Detecting Adversarial Examples with Feature Attribution , 2019, AAAI.

[14]  Changxi Zheng,et al.  Enhancing Adversarial Defense by k-Winners-Take-All , 2019, ICLR.

[15]  Ning Chen,et al.  Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness , 2019, ICLR.

[16]  Ben Y. Zhao,et al.  Gotta Catch'Em All: Using Honeypots to Catch Adversarial Attacks on Neural Networks , 2019, CCS.

[17]  Wei Xu,et al.  Adversarial Interpolation Training: A Simple Approach for Improving Model Robustness , 2019 .

[18]  Haichao Zhang,et al.  Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training , 2019, NeurIPS.

[19]  Matthias Bethge,et al.  Accurate, reliable and fast robustness evaluation , 2019, NeurIPS.

[20]  Chawin Sitawarin,et al.  Defending Against Adversarial Examples with K-Nearest Neighbor , 2019, ArXiv.

[21]  Ling Shao,et al.  Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Aleksander Madry,et al.  On Evaluating Adversarial Robustness , 2019, ArXiv.

[23]  Thomas Hofmann,et al.  The Odds are Odd: A Statistical Test for Detecting Adversarial Examples , 2019, ICML.

[24]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[25]  Mani Srivastava,et al.  GenAttack: practical black-box attacks with gradient-free optimization , 2018, GECCO.

[26]  Matthias Bethge,et al.  Towards the first adversarially robust neural network model on MNIST , 2018, ICLR.

[27]  Suman Jana,et al.  Certified Robustness to Adversarial Examples with Differential Privacy , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[28]  Ananthram Swami,et al.  Error Correcting Output Codes Improve Probability Estimation and Adversarial Robustness of Deep Neural Networks , 2019, NeurIPS.

[29]  Logan Engstrom,et al.  Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[30]  Colin Raffel,et al.  Thermometer Encoding: One Hot Way To Resist Adversarial Examples , 2018, ICLR.

[31]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[32]  Matthias Bethge,et al.  Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[33]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[34]  Yang Song,et al.  PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[35]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[36]  Moustapha Cissé,et al.  Countering Adversarial Images using Input Transformations , 2018, ICLR.

[37]  Nina Narodytska,et al.  Simple Black-Box Adversarial Attacks on Deep Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[38]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[39]  Jan Hendrik Metzen,et al.  On Detecting Adversarial Perturbations , 2017, ICLR.

[40]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[41]  David J. Fleet,et al.  Adversarial Manipulation of Deep Representations , 2015, ICLR.

[42]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[43]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[44]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .