On Adaptive Attacks to Adversarial Example Defenses

Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS---and chosen for illustrative and pedagogical purposes---can be circumvented despite attempting to perform evaluations using adaptive attacks. While prior evaluation papers focused mainly on the end result---showing that a defense was ineffective---this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus will allow the community to make further progress in building more robust models.

[1]  Matthias Bethge,et al.  Robust Perception through Analysis by Synthesis , 2018, ArXiv.

[2]  Aleksander Madry,et al.  Computer Vision with a Single (Robust) Classifier , 2019, NeurIPS 2019.

[3]  W. Brendel,et al.  Foolbox: A Python toolbox to benchmark the robustness of machine learning models , 2017 .

[4]  Kamyar Azizzadenesheli,et al.  Stochastic Activation Pruning for Robust Adversarial Defense , 2018, ICLR.

[5]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[6]  Matthias Hein,et al.  Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , 2020, ICML.

[7]  Matthew Willetts,et al.  Disentangling Improves VAEs' Robustness to Adversarial Attacks , 2019, ArXiv.

[8]  David J. Fleet,et al.  Adversarial Manipulation of Deep Representations , 2015, ICLR.

[9]  Dawn Xiaodong Song,et al.  Decision Boundary Analysis of Adversarial Examples , 2018, ICLR.

[10]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[11]  Kun Xu,et al.  Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks , 2020, ICLR.

[12]  Nicholas Carlini,et al.  On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses , 2018, ArXiv.

[13]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[14]  Aleksander Madry,et al.  On Evaluating Adversarial Robustness , 2019, ArXiv.

[15]  Moustapha Cissé,et al.  Countering Adversarial Images using Input Transformations , 2018, ICLR.

[16]  S. Chatterjee,et al.  Matrix estimation by Universal Singular Value Thresholding , 2012, 1212.1247.

[17]  Logan Engstrom,et al.  Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[18]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[19]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[20]  Pushmeet Kohli,et al.  Adversarial Risk and the Dangers of Evaluating Against Weak Attacks , 2018, ICML.

[21]  Tao Yu,et al.  A New Defense Against Adversarial Images: Turning a Weakness into a Strength , 2019, NeurIPS.

[22]  Logan Engstrom,et al.  Synthesizing Robust Adversarial Examples , 2017, ICML.

[23]  Balaraman Ravindran,et al.  EMPIR: Ensembles of Mixed Precision Deep Networks for Increased Robustness against Adversarial Attacks , 2020, ICLR.

[24]  Matthias Bethge,et al.  Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[25]  Nicholas Carlini,et al.  Is AmI (Attacks Meet Interpretability) Robust to Adversarial Examples? , 2019, ArXiv.

[26]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[27]  Matthias Bethge,et al.  Foolbox v0.8.0: A Python toolbox to benchmark the robustness of machine learning models , 2017, ArXiv.

[28]  Gustavo K. Rohde,et al.  Adversarial Example Detection and Classification With Asymmetrical Adversarial Training , 2019, ICLR 2020.

[29]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[30]  James A. Storer,et al.  Deflecting Adversarial Attacks with Pixel Deflection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Dawn Xiaodong Song,et al.  Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong , 2017, ArXiv.

[32]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Matthias Bethge,et al.  Towards the first adversarially robust neural network model on MNIST , 2018, ICLR.

[34]  Mitali Bafna,et al.  Thwarting Adversarial Examples: An L_0-Robust Sparse Fourier Transform , 2018, NeurIPS.

[35]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[36]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[37]  Ning Chen,et al.  Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness , 2019, ICLR.

[38]  Jinfeng Yi,et al.  EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples , 2017, AAAI.

[39]  Dina Katabi,et al.  ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation , 2019, ICML.

[40]  Aleksander Madry,et al.  Adversarial Robustness as a Prior for Learned Representations , 2019 .

[41]  Ian J. Goodfellow,et al.  Technical Report on the CleverHans v2.1.0 Adversarial Examples Library , 2016 .

[42]  Dan Boneh,et al.  Adversarial Training and Robustness for Multiple Perturbations , 2019, NeurIPS.

[43]  Thomas Hofmann,et al.  The Odds are Odd: A Statistical Test for Detecting Adversarial Examples , 2019, ICML.

[44]  Peilin Zhong,et al.  Resisting Adversarial Attacks by k-Winners-Take-All , 2019, ArXiv.

[45]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[46]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[47]  David A. Wagner,et al.  Defensive Distillation is Not Robust to Adversarial Examples , 2016, ArXiv.

[48]  Xiaoyu Cao,et al.  Mitigating Evasion Attacks to Deep Neural Networks via Region-based Classification , 2017, ACSAC.

[49]  Matthias Bethge,et al.  Accurate, reliable and fast robustness evaluation , 2019, NeurIPS.

[50]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[51]  Ning Chen,et al.  Improving Adversarial Robustness via Promoting Ensemble Diversity , 2019, ICML.

[52]  Jörn-Henrik Jacobsen,et al.  Exploiting Excessive Invariance caused by Norm-Bounded Adversarial Robustness , 2019, ArXiv.

[53]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[54]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[55]  Radha Poovendran,et al.  Are Odds Really Odd? Bypassing Statistical Detection of Adversarial Examples , 2019, ArXiv.

[56]  Ananthram Swami,et al.  Error Correcting Output Codes Improve Probability Estimation and Adversarial Robustness of Deep Neural Networks , 2019, NeurIPS.

[57]  Zhuolin Yang,et al.  Characterizing Audio Adversarial Examples Using Temporal Dependency , 2018, ICLR.

[58]  Jun Zhu,et al.  Max-Mahalanobis Linear Discriminant Analysis Networks , 2018, ICML.

[59]  Po-Sen Huang,et al.  An Alternative Surrogate Loss for PGD-based Adversarial Testing , 2019, ArXiv.

[60]  Yingzhen Li,et al.  Are Generative Classifiers More Robust to Adversarial Attacks? , 2018, ICML.