On Adaptive Attacks to Adversarial Example Defenses

Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS---and chosen for illustrative and pedagogical purposes---can be circumvented despite attempting to perform evaluations using adaptive attacks. While prior evaluation papers focused mainly on the end result---showing that a defense was ineffective---this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus will allow the community to make further progress in building more robust models.

[1]  Matthias Bethge,et al.  Robust Perception through Analysis by Synthesis , 2018, ArXiv.

[2]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[3]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[4]  Nicholas Carlini,et al.  On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses , 2018, ArXiv.

[5]  Dawn Xiaodong Song,et al.  Decision Boundary Analysis of Adversarial Examples , 2018, ICLR.

[6]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[7]  Jinfeng Yi,et al.  EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples , 2017, AAAI.

[8]  Jörn-Henrik Jacobsen,et al.  Exploiting Excessive Invariance caused by Norm-Bounded Adversarial Robustness , 2019, ArXiv.

[9]  Matthias Hein,et al.  Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , 2020, ICML.

[10]  Matthias Bethge,et al.  Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[11]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[12]  Po-Sen Huang,et al.  An Alternative Surrogate Loss for PGD-based Adversarial Testing , 2019, ArXiv.

[13]  Balaraman Ravindran,et al.  EMPIR: Ensembles of Mixed Precision Deep Networks for Increased Robustness against Adversarial Attacks , 2020, ICLR.

[14]  Kamyar Azizzadenesheli,et al.  Stochastic Activation Pruning for Robust Adversarial Defense , 2018, ICLR.

[15]  Ian J. Goodfellow,et al.  Technical Report on the CleverHans v2.1.0 Adversarial Examples Library , 2016 .

[16]  Matthias Bethge,et al.  Accurate, reliable and fast robustness evaluation , 2019, NeurIPS.

[17]  Jun Zhu,et al.  Max-Mahalanobis Linear Discriminant Analysis Networks , 2018, ICML.

[18]  Peilin Zhong,et al.  Resisting Adversarial Attacks by k-Winners-Take-All , 2019, ArXiv.

[19]  G. Rohde,et al.  Adversarial Example Detection and Classification With Asymmetrical Adversarial Training , 2019, ICLR 2020.

[20]  Dina Katabi,et al.  ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation , 2019, ICML.

[21]  Yingzhen Li,et al.  Are Generative Classifiers More Robust to Adversarial Attacks? , 2018, ICML.

[22]  W. Brendel,et al.  Foolbox: A Python toolbox to benchmark the robustness of machine learning models , 2017 .

[23]  Aleksander Madry,et al.  Adversarial Robustness as a Prior for Learned Representations , 2019 .

[24]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[25]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[26]  Ning Chen,et al.  Improving Adversarial Robustness via Promoting Ensemble Diversity , 2019, ICML.

[27]  Xiaoyu Cao,et al.  Mitigating Evasion Attacks to Deep Neural Networks via Region-based Classification , 2017, ACSAC.

[28]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[29]  Dawn Xiaodong Song,et al.  Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong , 2017, ArXiv.

[30]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[31]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[32]  Aleksander Madry,et al.  Computer Vision with a Single (Robust) Classifier , 2019, NeurIPS 2019.

[33]  Matthias Bethge,et al.  Foolbox v0.8.0: A Python toolbox to benchmark the robustness of machine learning models , 2017, ArXiv.

[34]  Logan Engstrom,et al.  Synthesizing Robust Adversarial Examples , 2017, ICML.

[35]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Thomas Hofmann,et al.  The Odds are Odd: A Statistical Test for Detecting Adversarial Examples , 2019, ICML.

[37]  David A. Wagner,et al.  Defensive Distillation is Not Robust to Adversarial Examples , 2016, ArXiv.

[38]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[39]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[40]  Ananthram Swami,et al.  Error Correcting Output Codes Improve Probability Estimation and Adversarial Robustness of Deep Neural Networks , 2019, NeurIPS.

[41]  Logan Engstrom,et al.  Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[42]  Mitali Bafna,et al.  Thwarting Adversarial Examples: An L_0-Robust Sparse Fourier Transform , 2018, NeurIPS.

[43]  Aleksander Madry,et al.  On Evaluating Adversarial Robustness , 2019, ArXiv.

[44]  Radha Poovendran,et al.  Are Odds Really Odd? Bypassing Statistical Detection of Adversarial Examples , 2019, ArXiv.

[45]  Ning Chen,et al.  Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness , 2019, ICLR.

[46]  Nicholas Carlini,et al.  Is AmI (Attacks Meet Interpretability) Robust to Adversarial Examples? , 2019, ArXiv.

[47]  Zhuolin Yang,et al.  Characterizing Audio Adversarial Examples Using Temporal Dependency , 2018, ICLR.

[48]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[49]  David J. Fleet,et al.  Adversarial Manipulation of Deep Representations , 2015, ICLR.

[50]  Dan Boneh,et al.  Adversarial Training and Robustness for Multiple Perturbations , 2019, NeurIPS.

[51]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[52]  Moustapha Cissé,et al.  Countering Adversarial Images using Input Transformations , 2018, ICLR.

[53]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[54]  Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks , 2019, ICLR.

[55]  S. Chatterjee,et al.  Matrix estimation by Universal Singular Value Thresholding , 2012, 1212.1247.

[56]  Matthew Willetts,et al.  Disentangling Improves VAEs' Robustness to Adversarial Attacks , 2019, ArXiv.

[57]  Tao Yu,et al.  A New Defense Against Adversarial Images: Turning a Weakness into a Strength , 2019, NeurIPS.

[58]  Pushmeet Kohli,et al.  Adversarial Risk and the Dangers of Evaluating Against Weak Attacks , 2018, ICML.

[59]  James A. Storer,et al.  Deflecting Adversarial Attacks with Pixel Deflection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[60]  Matthias Bethge,et al.  Towards the first adversarially robust neural network model on MNIST , 2018, ICLR.