Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong

Ongoing research has proposed several methods to defend neural networks against adversarial examples, many of which researchers have shown to be ineffective. We ask whether a strong defense can be created by combining multiple (possibly weak) defenses. To answer this question, we study three defenses that follow this approach. Two of these are recently proposed defenses that intentionally combine components designed to work well together. A third defense combines three independent defenses. For all the components of these defenses and the combined defenses themselves, we show that an adaptive adversary can create adversarial examples successfully with low distortion. Thus, our work implies that ensemble of weak defenses is not sufficient to provide strong defense against adversarial examples.

[1]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[2]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[3]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[4]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Luca Rigazio,et al.  Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[6]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[7]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Geoffrey Zweig,et al.  Achieving Human Parity in Conversational Speech Recognition , 2016, ArXiv.

[10]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[11]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[12]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[13]  Ananthram Swami,et al.  Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples , 2016, ArXiv.

[14]  Eduardo Valle,et al.  Exploring the space of adversarial images , 2015, 2016 International Joint Conference on Neural Networks (IJCNN).

[15]  Patrick D. McDaniel,et al.  On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.

[16]  Arslan Munir,et al.  Vulnerability of Deep Reinforcement Learning to Policy Induction Attacks , 2017, MLDM.

[17]  Ming-Yu Liu,et al.  Tactics of Adversarial Attack on Deep Reinforcement Learning Agents , 2017, IJCAI.

[18]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[19]  Ryan R. Curtin,et al.  Detecting Adversarial Samples from Artifacts , 2017, ArXiv.

[20]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[21]  Jan Hendrik Metzen,et al.  On Detecting Adversarial Perturbations , 2017, ICLR.

[22]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[23]  Yanjun Qi,et al.  Feature Squeezing Mitigates and Detects Carlini/Wagner Adversarial Examples , 2017, ArXiv.

[24]  Kevin Gimpel,et al.  Early Methods for Detecting Adversarial Images , 2016, ICLR.

[25]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[26]  Dawn Xiaodong Song,et al.  Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[27]  Zhitao Gong,et al.  Adversarial and Clean Data Are Not Twins , 2017, aiDM@SIGMOD.

[28]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[29]  Sandy H. Huang,et al.  Adversarial Attacks on Neural Network Policies , 2017, ICLR.

[30]  Dawn Xiaodong Song,et al.  Delving into adversarial attacks on deep policies , 2017, ICLR.

[31]  Christian Gagné,et al.  Robustness to Adversarial Examples through an Ensemble of Specialists , 2017, ICLR.

[32]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[33]  Dawn Xiaodong Song,et al.  Adversarial Examples for Generative Models , 2017, 2018 IEEE Security and Privacy Workshops (SPW).

[34]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.