Detection defense against adversarial attacks with saliency map

It is well established that neural networks are vulnerable to adversarial examples, which are almost imperceptible on human vision and can cause the deep models misbehave. Such phenomenon may lead to severely inestimable consequences in the safety and security critical applications. Existing defenses are trend to harden the robustness of models against adversarial attacks, e.g., adversarial training technology. However, these are usually intractable to implement due to the high cost of re-training and the cumbersome operations of altering the model architecture or parameters. In this paper, we discuss the saliency map method from the view of enhancing model interpretability, it is similar to introducing the mechanism of the attention to the model, so as to comprehend the progress of object identification by the deep networks. We then propose a novel method combined with additional noises and utilize the inconsistency strategy to detect adversarial examples. Our experimental results of some representative adversarial attacks on common datasets including ImageNet and popular models show that our method can detect all the attacks with high detection success rate effectively. We compare it with the existing state-of-the-art technique, and the experiments indicate that our method is more general.

[1]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2017, Pattern Recognit..

[2]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Shiyu Li,et al.  Defend Against Adversarial Samples by Using Perceptual Hash , 2020, Computers, Materials & Continua.

[4]  Aditi Raghunathan,et al.  Semidefinite relaxations for certifying robustness to adversarial examples , 2018, NeurIPS.

[5]  Terrance E. Boult,et al.  Are facial attributes adversarially robust? , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[6]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[7]  Atul Prakash,et al.  Robust Physical-World Attacks on Deep Learning Visual Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Yi Yang,et al.  DevNet: A Deep Event Network for multimedia event detection and evidence recounting , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[10]  Jack W. Stokes,et al.  Large-scale malware classification using random projections and neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Lei Ma,et al.  Stealthy and Efficient Adversarial Attacks against Deep Reinforcement Learning , 2020, AAAI.

[12]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[13]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[14]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[15]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[16]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[17]  Michael I. Jordan,et al.  ML-LOO: Detecting Adversarial Examples with Feature Attribution , 2019, AAAI.

[18]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Larry S. Davis,et al.  Universal Adversarial Training , 2018, AAAI.

[21]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[22]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[23]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[24]  Haichao Zhang,et al.  Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training , 2019, NeurIPS.

[25]  Tat-Seng Chua,et al.  Heuristic Black-Box Adversarial Attacks on Video Recognition Models , 2019, AAAI 2020.

[26]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[27]  Wen-Chuan Lee,et al.  NIC: Detecting Adversarial Samples with Neural Network Invariant Checking , 2019, NDSS.

[28]  Cho-Jui Hsieh,et al.  Efficient Neural Network Robustness Certification with General Activation Functions , 2018, NeurIPS.

[29]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[30]  Larry S. Davis,et al.  Adversarial Training for Free! , 2019, NeurIPS.

[31]  Ruigang Liang,et al.  Seeing isn't Believing: Towards More Robust Adversarial Attack Against Real World Object Detectors , 2019, CCS.

[32]  J. Zico Kolter,et al.  Fast is better than free: Revisiting adversarial training , 2020, ICLR.

[33]  Arun Ross,et al.  Soft biometric privacy: Retaining biometric utility of face images while perturbing gender , 2017, 2017 IEEE International Joint Conference on Biometrics (IJCB).

[34]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[35]  Alan L. Yuille,et al.  Mitigating adversarial effects through randomization , 2017, ICLR.

[36]  J. Zico Kolter,et al.  Scaling provable adversarial defenses , 2018, NeurIPS.

[37]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[38]  Suman Jana,et al.  Certified Robustness to Adversarial Examples with Differential Privacy , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[39]  Moustapha Cissé,et al.  Countering Adversarial Images using Input Transformations , 2018, ICLR.

[40]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[42]  Xiaolin Hu,et al.  Defense Against Adversarial Attacks Using High-Level Representation Guided Denoiser , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Xiangyu Zhang,et al.  Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples , 2018, NeurIPS.

[44]  Aoying Zhou,et al.  Improving Hypernymy Prediction via Taxonomy Enhanced Adversarial Learning , 2019, AAAI.

[45]  Hao Chen,et al.  MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[46]  Yang Song,et al.  PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[47]  Benjamin Edwards,et al.  Adversarial Robustness Toolbox v0.2.2 , 2018, ArXiv.

[48]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[49]  Michael K. Reiter,et al.  Statistical Privacy for Streaming Traffic , 2019, NDSS.