Adversarial Risk and the Dangers of Evaluating Against Weak Attacks

This paper investigates recently proposed approaches for defending against adversarial examples and evaluating adversarial robustness. The existence of adversarial examples in trained neural networks reflects the fact that expected risk alone does not capture the model's performance against worst-case inputs. We motivate the use of adversarial risk as an objective, although it cannot easily be computed exactly. We then frame commonly used attacks and evaluation metrics as defining a tractable surrogate objective to the true adversarial risk. This suggests that models may be obscured to adversaries, by optimizing this surrogate rather than the true adversarial risk. We demonstrate that this is a significant problem in practice by repurposing gradient-free optimization techniques into adversarial attacks, which we use to decrease the accuracy of several recently proposed defenses to near zero. Our hope is that our formulations and results will help researchers to develop more powerful defenses.

[1]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[2]  J. Spall Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .

[3]  Deborah Silver,et al.  Feature Visualization , 1994, Scientific Visualization.

[4]  J. L. Maryak,et al.  Global random optimization by simultaneous perturbation stochastic approximation , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).

[5]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[6]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[7]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Jean-Yves Audibert,et al.  Robust linear least squares regression , 2010, 1010.0074.

[9]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[11]  Luca Rigazio,et al.  Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  G. Lugosi,et al.  Empirical risk minimization for heavy-tailed losses , 2014, 1406.2462.

[14]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[15]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[16]  Been Kim,et al.  Interactive and interpretable machine learning models for human machine collaboration , 2015 .

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[19]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[20]  Patrick D. McDaniel,et al.  Adversarial Perturbations Against Deep Neural Networks for Malware Classification , 2016, ArXiv.

[21]  David Warde-Farley,et al.  1 Adversarial Perturbations of Deep Neural Networks , 2016 .

[22]  Finale Doshi-Velez,et al.  Increasing the Interpretability of Recurrent Neural Networks Using Hidden Markov Models , 2016, ArXiv.

[23]  Patrick D. McDaniel,et al.  Cleverhans V0.1: an Adversarial Machine Learning Library , 2016, ArXiv.

[24]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[25]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[26]  Clark W. Barrett,et al.  Provably Minimally-Distorted Adversarial Examples , 2017 .

[27]  Hao Chen,et al.  MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[28]  Xi Chen,et al.  PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.

[29]  Pushmeet Kohli,et al.  Piecewise Linear Neural Network verification: A comparative study , 2017, ArXiv.

[30]  Logan Engstrom,et al.  Query-Efficient Black-box Adversarial Examples , 2017, ArXiv.

[31]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[32]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[33]  Abhinav Gupta,et al.  Robust Adversarial Reinforcement Learning , 2017, ICML.

[34]  John C. Duchi,et al.  Certifiable Distributional Robustness with Principled Adversarial Training , 2017, ArXiv.

[35]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[36]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[37]  David L. Dill,et al.  Ground-Truth Adversarial Examples , 2017, ArXiv.

[38]  Sandy H. Huang,et al.  Adversarial Attacks on Neural Network Policies , 2017, ICLR.

[39]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[40]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[41]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[42]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[43]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[44]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[45]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[46]  John C. Duchi,et al.  Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[47]  Logan Engstrom,et al.  Synthesizing Robust Adversarial Examples , 2017, ICML.

[48]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.

[49]  Alan L. Yuille,et al.  Mitigating adversarial effects through randomization , 2017, ICLR.

[50]  Andrew Slavin Ross,et al.  Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients , 2017, AAAI.

[51]  Xiaolin Hu,et al.  Defense Against Adversarial Attacks Using High-Level Representation Guided Denoiser , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Moustapha Cissé,et al.  Countering Adversarial Images using Input Transformations , 2018, ICLR.

[53]  Yang Song,et al.  PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[54]  Pan He,et al.  Adversarial Examples: Attacks and Defenses for Deep Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.