Detecting Adversarial Samples for Deep Neural Networks through Mutation Testing

Recently, it has been shown that deep neural networks (DNN) are subject to attacks through adversarial samples. Adversarial samples are often crafted through adversarial perturbation, i.e., manipulating the original sample with minor modifications so that the DNN model labels the sample incorrectly. Given that it is almost impossible to train perfect DNN, adversarial samples are shown to be easy to generate. As DNN are increasingly used in safety-critical systems like autonomous cars, it is crucial to develop techniques for defending such attacks. Existing defense mechanisms which aim to make adversarial perturbation challenging have been shown to be ineffective. In this work, we propose an alternative approach. We first observe that adversarial samples are much more sensitive to perturbations than normal samples. That is, if we impose random perturbations on a normal and an adversarial sample respectively, there is a significant difference between the ratio of label change due to the perturbations. Observing this, we design a statistical adversary detection algorithm called nMutant (inspired by mutation testing from software engineering community). Our experiments show that nMutant effectively detects most of the adversarial samples generated by recently proposed attacking methods. Furthermore, we provide an error bound with certain statistical significance along with the detection.

[1]  Dawn Xiaodong Song,et al.  Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[2]  Patrick D. McDaniel,et al.  Extending Defensive Distillation , 2017, ArXiv.

[3]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[4]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[5]  Moustapha Cissé,et al.  Countering Adversarial Images using Input Transformations , 2018, ICLR.

[6]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Medhat A. Moussa,et al.  Attacking Binarized Neural Networks , 2017, ICLR.

[8]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[9]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[10]  Ian J. Goodfellow,et al.  Technical Report on the CleverHans v2.1.0 Adversarial Examples Library , 2016 .

[11]  Moustapha Cissé,et al.  Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[12]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[13]  Mingyan Liu,et al.  Spatially Transformed Adversarial Examples , 2018, ICLR.

[14]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[15]  Patrick D. McDaniel,et al.  On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.

[16]  Jinfeng Yi,et al.  EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples , 2017, AAAI.

[17]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[18]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[19]  John C. Duchi,et al.  Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[20]  Jan Hendrik Metzen,et al.  On Detecting Adversarial Perturbations , 2017, ICLR.

[21]  R. Venkatesh Babu,et al.  Fast Feature Fool: A data independent approach to universal adversarial perturbations , 2017, BMVC.

[22]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[23]  Junfeng Yang,et al.  DeepXplore: Automated Whitebox Testing of Deep Learning Systems , 2017, SOSP.

[24]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[25]  Luca Rigazio,et al.  Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[26]  Patrick D. McDaniel,et al.  Cleverhans V0.1: an Adversarial Machine Learning Library , 2016, ArXiv.

[27]  Lujo Bauer,et al.  Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition , 2016, CCS.

[28]  Kouichi Sakurai,et al.  Attacking convolutional neural network using differential evolution , 2018, IPSJ Transactions on Computer Vision and Applications.

[29]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[30]  Matthias Bethge,et al.  Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[31]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[32]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[33]  Mykel J. Kochenderfer,et al.  Towards Proving the Adversarial Robustness of Deep Neural Networks , 2017, FVAV@iFM.

[34]  Uri Shaham,et al.  Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust Optimization , 2015, ArXiv.

[35]  Min Wu,et al.  Safety Verification of Deep Neural Networks , 2016, CAV.

[36]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[37]  Matthew Wicker,et al.  Feature-Guided Black-Box Safety Testing of Deep Neural Networks , 2017, TACAS.