Improving Adversarial Robustness via Channel-wise Activation Suppressing

The study of adversarial examples and their activations have attracted significant attention for secure and robust learning with deep neural networks (DNNs). Different from existing works, in this paper, we highlight two new characteristics of adversarial examples from the channel-wise activation perspective: 1) the activation magnitudes of adversarial examples are higher than that of natural examples; and 2) the channels are activated more uniformly by adversarial examples than natural examples. We find that, while the state-of-the-art defense adversarial training has addressed the first issue of high activation magnitude via training on adversarial examples, the second issue of uniform activation remains. This motivates us to suppress redundant activations from being activated by adversarial perturbations during the adversarial training process, via a Channel-wise Activation Suppressing (CAS) training strategy. We show that CAS can train a model that inherently suppresses adversarial activations, and can be easily applied to existing defense methods to further improve their robustness. Our work provides a simplebut generic training strategy for robustifying the intermediate layer activations of DNNs.

[1]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[2]  Yan Feng,et al.  Hilbert-Based Generative Defense for Adversarial Examples , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Matthias Hein,et al.  Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack , 2019, ICML.

[4]  Matthias Hein,et al.  Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , 2020, ICML.

[5]  James Bailey,et al.  Adversarial Camouflage: Hiding Physical-World Attacks With Natural Styles , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Adam M. Oberman,et al.  Improved robustness to adversarial examples using Lipschitz regularization of the loss , 2018, ArXiv.

[7]  Graham W. Taylor,et al.  Batch Normalization is a Cause of Adversarial Vulnerability , 2019, ArXiv.

[8]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[9]  Luca Rigazio,et al.  Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[10]  James Bailey,et al.  Imbalanced Gradients: A New Cause of Overestimated Adversarial Robustness , 2020, ArXiv.

[11]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[12]  Tong Zhang,et al.  NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks , 2019, ICML.

[13]  Xiaolin Hu,et al.  Defense Against Adversarial Attacks Using High-Level Representation Guided Denoiser , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Dawn Xiaodong Song,et al.  Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[15]  James Bailey,et al.  Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[16]  James Bailey,et al.  Understanding Adversarial Attacks on Deep Learning Based Medical Image Analysis Systems , 2019, Pattern Recognit..

[17]  Jinwoo Shin,et al.  Adversarial Neural Pruning with Latent Vulnerability Suppression , 2020, ICML.

[18]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[19]  Ling Shao,et al.  Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[21]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[22]  Jinfeng Yi,et al.  Defend Deep Neural Networks Against Adversarial Examples via Fixed andDynamic Quantized Activation Functions , 2018, ArXiv.

[23]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[24]  Vineeth N. Balasubramanian,et al.  Harnessing the Vulnerability of Latent Layers in Adversarially Trained Models , 2019, IJCAI.

[25]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[26]  Stanley Osher,et al.  Adversarial Defense via Data Dependent Activation Function and Total Variation Minimization , 2018, ArXiv.

[27]  James Bailey,et al.  On the Convergence and Robustness of Adversarial Training , 2021, ICML.

[28]  James Bailey,et al.  Improving Adversarial Robustness Requires Revisiting Misclassified Examples , 2020, ICLR.

[29]  James Bailey,et al.  Skip Connections Matter: On the Transferability of Adversarial Examples Generated with ResNets , 2020, ICLR.

[30]  Yisen Wang,et al.  Adversarial Weight Perturbation Helps Robust Generalization , 2020, NeurIPS.

[31]  Peilin Zhong,et al.  Resisting Adversarial Attacks by k-Winners-Take-All , 2019, ArXiv.

[32]  Alan L. Yuille,et al.  Feature Denoising for Improving Adversarial Robustness , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Chuang Gan,et al.  Interpreting Adversarial Examples by Activation Promotion and Suppression , 2019, ArXiv.

[34]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[35]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[36]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[37]  J. Zico Kolter,et al.  Overfitting in adversarially robust deep learning , 2020, ICML.

[38]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[39]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[40]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[41]  Atul Prakash,et al.  Robust Physical-World Attacks on Deep Learning Visual Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Aleksander Madry,et al.  On Adaptive Attacks to Adversarial Example Defenses , 2020, NeurIPS.

[44]  Zhiheng Huang,et al.  Residual Convolutional CTC Networks for Automatic Speech Recognition , 2017, ArXiv.

[45]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[46]  Xin Wang,et al.  A Unified Approach to Interpreting and Boosting Adversarial Transferability , 2021, ICLR 2021 Poster.

[47]  Kamyar Azizzadenesheli,et al.  Stochastic Activation Pruning for Robust Adversarial Defense , 2018, ICLR.