NIP: Neuron-level Inverse Perturbation Against Adversarial Attacks

Although deep learning models have achieved unprecedented success, their vulnerabilities towards adversarial attacks have attracted increasing attention, especially when deployed in security-critical domains. To address the challenge, numerous defense strategies, including reactive and proactive ones, have been proposed for robustness improvement. From the perspective of image feature space, some of them cannot reach satisfying results due to the shift of features. Besides, features learned by models are not directly related to classification results. Different from them, We consider defense method essentially from model inside and investigated the neuron behaviors before and after attacks. We observed that attacks mislead the model by dramatically changing the neurons that contribute most and least to the correct label. Motivated by it, we introduce the concept of neuron influence and further divide neurons into front, middle and tail part. Based on it, we propose neuron-level inverse perturbation(NIP), the first neuron-level reactive defense method against adversarial attacks. By strengthening front neurons and weakening those in the tail part, NIP can eliminate nearly all adversarial perturbations while still maintaining high benign accuracy. Besides, it can cope with different sizes of perturbations via adaptivity, especially larger ones. Comprehensive experiments conducted on three datasets and six models show that NIP outperforms the state-of-the-art baselines against eleven adversarial attacks. We further provide interpretable proofs via neuron activation and visualization for better understanding. Impact Statement—Deep learning has attracted tremendous attentions in many fields but some studies have proved that they are vulnerable to adversarial attacks. Meanwhile, defense methods against them are developed as well. Some solutions via simple image transformations are easy to implement but may be compromised in the event of larger perturbations. Methods based on feature space are recently proposed, by projecting or mappings the distribution of adversarial examples back to that of benign examples. But they may fail due to the shift of feature during operations on adversarial examples. Besides, feature distribution doesn’t directly relate to classifications. We propose NIP, a neuron-level defense against generic adversarial attack, which This research was supported by the National Natural Science Foundation of China under Grant No. 62072406, the Natural Science Foundation of Zhejiang Provincial under Grant No. LY19F020025. R. Chen is with the College of Information Engineering at Zhejiang University of Technology, Hangzhou 310007, China. (e-mail: 2112003149@zjut.edu.cn). H. Jin is with the College of Information Engineering at Zhejiang University of Technology, Hangzhou 310007, China. (e-mail: 2112003035@zjut.edu.cn). J. Chen is with the Institute of Cyberspace Security, College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China. (e-mail: chenjinyin@zjut.edu.cn) H. Zheng is with the College of Information Engineering at Zhejiang University of Technology, Hangzhou 310007, China. (e-mail: haibinzheng320@gmail.com). Y. Yue is with the Key Laboratory of Parallel and Distributed Computing, College of Computer, National University of Defense Technology, Changsha, 410000, China. (email: yuyue@nudt.edu.cn) S. Ji is with the College of Computer Science and Technology at Zhejiang University, Hangzhou 310007, China. (e-mail: sji@zju.edu.cn) bridge the neuron behaviors and correct classifications, from the perspective of model inside. By suppressing neurons exploited by attacks and enhancing class-relevant ones, it provides an attackagnostic, input-aware and more fine-grained solution for defense.

[1]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[4]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Xianglong Liu,et al.  Interpreting and Improving Adversarial Robustness of Deep Neural Networks With Neuron Sensitivity , 2021, IEEE Transactions on Image Processing.

[6]  Zoubin Ghahramani,et al.  A study of the effect of JPG compression on adversarial images , 2016, ArXiv.

[7]  Ben Y. Zhao,et al.  Gotta Catch'Em All: Using Honeypots to Catch Adversarial Attacks on Neural Networks , 2019, CCS.

[8]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[9]  Shouling Ji,et al.  Deep Dual Consecutive Network for Human Pose Estimation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Zhenguang Liu,et al.  Combining Graph Neural Networks With Expert Knowledge for Smart Contract Vulnerability Detection , 2021, IEEE Transactions on Knowledge and Data Engineering.

[11]  Anshuman Suri,et al.  One Neuron to Fool Them All , 2020, ArXiv.

[12]  Matthias Bethge,et al.  Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[13]  Kamyar Azizzadenesheli,et al.  Stochastic Activation Pruning for Robust Adversarial Defense , 2018, ICLR.

[14]  Alan L. Yuille,et al.  Feature Denoising for Improving Adversarial Robustness , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Fahad Shahbaz Khan,et al.  A Self-supervised Approach for Adversarial Robustness , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[17]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Md Shafaeat Hossain,et al.  Effectiveness of Deep Learning on Serial Fusion Based Biometric Systems , 2021, IEEE Transactions on Artificial Intelligence.

[19]  Junfeng Yang,et al.  DeepXplore: Automated Whitebox Testing of Deep Learning Systems , 2017, SOSP.

[20]  Lei Ma,et al.  DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[21]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[23]  Lei Ma,et al.  DeepHunter: a coverage-guided fuzz testing framework for deep neural networks , 2019, ISSTA.

[24]  Shu-Tao Xia,et al.  Improving Adversarial Robustness via Channel-wise Activation Suppressing , 2021, ICLR.

[25]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[26]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[27]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Wen-Chuan Lee,et al.  NIC: Detecting Adversarial Samples with Neural Network Invariant Checking , 2019, NDSS.

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Kouichi Sakurai,et al.  One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.

[32]  Atul Prakash,et al.  Robust Physical-World Attacks on Deep Learning Visual Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Terrance E. Boult,et al.  Facial Attributes: Accuracy and Adversarial Robustness , 2017, Pattern Recognit. Lett..

[34]  Moustapha Cissé,et al.  Countering Adversarial Images using Input Transformations , 2018, ICLR.

[35]  Johannes Stallkamp,et al.  The German Traffic Sign Recognition Benchmark: A multi-class classification competition , 2011, The 2011 International Joint Conference on Neural Networks.

[36]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[37]  Roger Zimmermann,et al.  Towards Natural and Accurate Future Motion Prediction of Humans and Animals , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Dacheng Tao,et al.  Self-Supervised Pose Adaptation for Cross-Domain Image Animation , 2020, IEEE Transactions on Artificial Intelligence.

[39]  Matthias Bethge,et al.  Towards the first adversarially robust neural network model on MNIST , 2018, ICLR.

[40]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[42]  Iqbal H. Sarker,et al.  A Systematic Review on the Use of AI and ML for Fighting the COVID-19 Pandemic , 2020, IEEE Transactions on Artificial Intelligence.

[43]  Li Zhao,et al.  Robust feature learning for adversarial defense via hierarchical feature alignment , 2021, Inf. Sci..

[44]  Jun Pan,et al.  Spot Evasion Attacks: Adversarial Examples for License Plate Recognition Systems with Convolution Neural Networks , 2020, Comput. Secur..

[45]  Zhenguang Liu,et al.  Smart Contract Vulnerability Detection using Graph Neural Network , 2020, IJCAI.

[46]  Bo Sun,et al.  Adversarial Defense by Stratified Convolutional Sparse Coding , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Tao Liu,et al.  Feature Distillation: DNN-Oriented JPEG Compression Against Adversarial Examples , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).