Imbalanced Gradients: A New Cause of Overestimated Adversarial Robustness

Evaluating the robustness of a defense model is a challenging task in adversarial robustness research. Obfuscated gradients, a type of gradient masking, have previously been found to exist in many defense methods and cause a false signal of robustness. In this paper, we identify a more subtle situation called \emph{Imbalanced Gradients} that can also cause overestimated adversarial robustness. The phenomenon of imbalanced gradients occurs when the gradient of one term of the margin loss dominates and pushes the attack towards to a suboptimal direction. To exploit imbalanced gradients, we formulate a \emph{Margin Decomposition (MD)} attack that decomposes a margin loss into individual terms and then explores the attackability of these terms separately via a two-stage process. We examine 12 state-of-the-art defense models, and find that models exploiting label smoothing easily cause imbalanced gradients, and on which our MD attacks can decrease their PGD robustness (evaluated by PGD attack) by over 23%. For 6 out of the 12 defenses, our attack can reduce their PGD robustness by at least 9%. The results suggest that imbalanced gradients need to be carefully addressed for more reliable adversarial robustness.

[1]  Stefano Ermon,et al.  Output Diversified Initialization for Adversarial Attacks , 2020, ArXiv.

[2]  James Bailey,et al.  Understanding Adversarial Attacks on Deep Learning Based Medical Image Analysis Systems , 2019, Pattern Recognit..

[3]  Luca Rigazio,et al.  Towards Deep Neural Network Architectures Robust to Adversarial Examples , 2014, ICLR.

[4]  Dawn Xiaodong Song,et al.  Decision Boundary Analysis of Adversarial Examples , 2018, ICLR.

[5]  Matthias Hein,et al.  Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack , 2019, ICML.

[6]  Matthias Hein,et al.  Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , 2020, ICML.

[7]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[8]  James Bailey,et al.  Improving Adversarial Robustness Requires Revisiting Misclassified Examples , 2020, ICLR.

[9]  Moustapha Cissé,et al.  Countering Adversarial Images using Input Transformations , 2018, ICLR.

[10]  Stefano Ermon,et al.  Diversity can be Transferred: Output Diversification for White- and Black-box Attacks , 2020, NeurIPS.

[11]  James Bailey,et al.  Adversarial Camouflage: Hiding Physical-World Attacks With Natural Styles , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Atul Prakash,et al.  Robust Physical-World Attacks on Deep Learning Visual Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[14]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[15]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[16]  James Bailey,et al.  Skip Connections Matter: On the Transferability of Adversarial Examples Generated with ResNets , 2020, ICLR.

[17]  Jinfeng Yi,et al.  EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples , 2017, AAAI.

[18]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[19]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[20]  Matthias Hein,et al.  Logit Pairing Methods Can Fool Gradient-Based Attacks , 2018, ArXiv.

[21]  Po-Sen Huang,et al.  An Alternative Surrogate Loss for PGD-based Adversarial Testing , 2019, ArXiv.

[22]  Xiao Wang,et al.  Sensible adversarial learning , 2019 .

[23]  Xiaolin Hu,et al.  Defense Against Adversarial Attacks Using High-Level Representation Guided Denoiser , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Jinfeng Yi,et al.  A Frank-Wolfe Framework for Efficient and Effective Adversarial Attacks , 2018, AAAI.

[25]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  James Bailey,et al.  On the Convergence and Robustness of Adversarial Training , 2021, ICML.

[27]  Nicolas Flammarion,et al.  Square Attack: a query-efficient black-box adversarial attack via random search , 2020, ECCV.

[28]  Wei Xu,et al.  Adversarial Interpolation Training: A Simple Approach for Improving Model Robustness , 2019 .

[29]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[30]  Haichao Zhang,et al.  Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training , 2019, NeurIPS.

[31]  James Bailey,et al.  Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[32]  Shang-Tse Chen,et al.  Compression to the Rescue : Defending from Adversarial Attacks Across Modalities Extended Abstract , 2018 .

[33]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[34]  Jinfeng Yi,et al.  Defend Deep Neural Networks Against Adversarial Examples via Fixed andDynamic Quantized Activation Functions , 2018, ArXiv.

[35]  Andrew Slavin Ross,et al.  Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients , 2017, AAAI.

[36]  Logan Engstrom,et al.  Evaluating and Understanding the Robustness of Adversarial Logit Pairing , 2018, ArXiv.

[37]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[38]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[39]  Jianyu Wang,et al.  Bilateral Adversarial Training: Towards Fast Training of More Robust Models Against Adversarial Attacks , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Kui Ren,et al.  Distributionally Adversarial Attack , 2018, AAAI.

[41]  James Bailey,et al.  Clean-Label Backdoor Attacks on Video Recognition Models , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  James Bailey,et al.  Black-box Adversarial Attacks on Video Recognition Models , 2019, ACM Multimedia.

[43]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[44]  Ruitong Huang,et al.  Max-Margin Adversarial (MMA) Training: Direct Input Space Margin Maximization through Adversarial Training , 2018, ICLR.

[45]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[46]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[47]  Po-Sen Huang,et al.  Are Labels Required for Improving Adversarial Robustness? , 2019, NeurIPS.

[48]  Tao Liu,et al.  Security analysis and enhancement of model compressed deep learning systems under adversarial attacks , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).

[49]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[50]  Jie Fu,et al.  Jacobian Adversarially Regularized Networks for Robustness , 2020, ICLR.

[51]  Pushmeet Kohli,et al.  Adversarial Risk and the Dangers of Evaluating Against Weak Attacks , 2018, ICML.

[52]  Ludwig Schmidt,et al.  Unlabeled Data Improves Adversarial Robustness , 2019, NeurIPS.