NoiseCAM: Explainable AI for the Boundary Between Noise and Adversarial Attacks

Deep Learning (DL) and Deep Neural Networks (DNNs) are widely used in various domains. However, adversarial attacks can easily mislead a neural network and lead to wrong decisions. Defense mechanisms are highly preferred in safety-critical applications. In this paper, firstly, we use the gradient class activation map (GradCAM) to analyze the behavior deviation of the VGG-16 network when its inputs are mixed with adversarial perturbation or Gaussian noise. In particular, our method can locate vulnerable layers that are sensitive to adversarial perturbation and Gaussian noise. We also show that the behavior deviation of vulnerable layers can be used to detect adversarial examples. Secondly, we propose a novel NoiseCAM algorithm that integrates information from globally and pixel-level weighted class activation maps. Our algorithm is susceptible to adversarial perturbations and will not respond to Gaussian random noise mixed in the inputs. Third, we compare detecting adversarial examples using both behavior deviation and NoiseCAM, and we show that NoiseCAM outperforms behavior deviation modeling in its overall performance. Our work could provide a useful tool to defend against certain adversarial attacks on deep neural networks.

[1]  Jian Wang,et al.  Exploring Adversarial Attacks on Neural Networks: An Explainable Approach , 2022, IEEE International Performance, Computing, and Communications Conference.

[2]  S. Sharanya,et al.  An integrated Auto Encoder-Block Switching defense approach to prevent adversarial attacks , 2022, ArXiv.

[3]  David Lo,et al.  Revisiting Neuron Coverage Metrics and Quality of Deep Neural Networks , 2022, 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER).

[4]  F. Gabbay,et al.  A LIME-Based Explainable Machine Learning Model for Predicting the Severity Level of COVID-19 Diagnosed Patients , 2021, Applied Sciences.

[5]  Ming-Ming Cheng,et al.  LayerCAM: Exploring Hierarchical Class Activation Maps for Localization , 2021, IEEE Transactions on Image Processing.

[6]  Qiang Liu,et al.  MaxUp: Lightweight Adversarial Training with Data Augmentation Improves Neural Network Training , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yu Jiang,et al.  Coverage Guided Differential Adversarial Testing of Deep Learning Systems , 2021, IEEE Transactions on Network Science and Engineering.

[8]  Jianqiang Li,et al.  Machine Learning for the Detection and Identification of Internet of Things Devices: A Survey , 2021, IEEE Internet of Things Journal.

[9]  Houbing Song,et al.  Distant Domain Transfer Learning for Medical Imaging , 2020, IEEE Journal of Biomedical and Health Informatics.

[10]  Miryung Kim,et al.  Is neuron coverage a meaningful measure for testing deep neural networks? , 2020, ESEC/SIGSOFT FSE.

[11]  Quanquan Gu,et al.  Do Wider Neural Networks Really Help Adversarial Robustness? , 2020, NeurIPS.

[12]  Houbing Song,et al.  Counter-Unmanned Aircraft System(s) (C-UAS): State of the Art, Challenges, and Future Trends , 2020, IEEE Aerospace and Electronic Systems Magazine.

[13]  Yinghui Gao,et al.  Axiom-based Grad-CAM: Towards Accurate Visualization and Explanation of CNNs , 2020, BMVC.

[14]  Attila Lengyel,et al.  Evaluating the performance of the LIME and Grad-CAM explanation methods on a LEGO multi-label image classification task , 2020, ArXiv.

[15]  Alexey Ignatiev,et al.  Towards Trustable Explainable AI , 2020, IJCAI.

[16]  George Kesidis,et al.  Adversarial Learning Targeting Deep Neural Network Classification: A Comprehensive Review of Defenses Against Attacks , 2020, Proceedings of the IEEE.

[17]  Lei Ma,et al.  DeepHunter: a coverage-guided fuzz testing framework for deep neural networks , 2019, ISSTA.

[18]  Cho-Jui Hsieh,et al.  Convergence of Adversarial Training in Overparametrized Neural Networks , 2019, NeurIPS.

[19]  Yoshua Bengio,et al.  Interpolated Adversarial Training: Achieving Robust Neural Networks Without Sacrificing Too Much Accuracy , 2019, AISec@CCS.

[20]  Nam Ik Cho,et al.  PuVAE: A Variational Autoencoder to Purify Adversarial Examples , 2019, IEEE Access.

[21]  Ian Goodfellow,et al.  TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing , 2018, ICML.

[22]  Hinrich Schütze,et al.  Evaluating neural network explanation methods using hybrid documents and morphosyntactic agreement , 2018, ACL.

[23]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[24]  Qiang Xu,et al.  Towards Imperceptible and Robust Adversarial Example Attacks against Neural Networks , 2018, AAAI.

[25]  Philip Harris,et al.  Machine learning uncertainties with adversarial neural networks , 2018, The European Physical Journal C.

[26]  Anirban Sarkar,et al.  Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[27]  Junfeng Yang,et al.  DeepXplore: Automated Whitebox Testing of Deep Learning Systems , 2017, SOSP.

[28]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Lovedeep Gondara,et al.  Medical Image Denoising Using Convolutional Denoising Autoencoders , 2016, 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW).

[30]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[31]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[32]  Luca Rigazio,et al.  Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[33]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[34]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[35]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Zhihao Zheng,et al.  Robust Detection of Adversarial Attacks by Modeling the Intrinsic Properties of Deep Neural Networks , 2018, NeurIPS.