Adaptive iterative attack towards explainable adversarial robustness

Abstract Image classifiers based on deep neural networks show severe vulnerability when facing adversarial examples crafted on purpose. Designing more effective and efficient adversarial attacks is attracting considerable interest due to its potential contribution to interpretability of deep learning and validation of neural networks’ robustness. However, current iterative attacks use a fixed step size for each noise-adding step, making further investigation into the effect of variable step size on model robustness ripe for exploration. We prove that if the upper bound of noise added to the original image is fixed, the attack effect can be improved if the step size is positively correlated with the gradient obtained at each step by querying the target model. In this paper, we propose Ada-FGSM (Adaptive FGSM), a new iterative attack that adaptively allocates step size of noises according to gradient information at each step. Improvement of success rate and accuracy decrease measured on ImageNet with multiple models emphasizes the validity of our method. We analyze the process of iterative attack by visualizing their trajectory and gradient contour, and further explain the vulnerability of deep neural networks to variable step size adversarial examples.

[1]  Ross Maciejewski,et al.  Explaining Vulnerabilities to Adversarial Machine Learning through Visual Analytics , 2019, IEEE Transactions on Visualization and Computer Graphics.

[2]  Chao Li,et al.  Active multi-kernel domain adaptation for hyperspectral image classification , 2017, Pattern Recognit..

[3]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Beilun Wang,et al.  A Theoretical Framework for Robustness of (Deep) Classifiers against Adversarial Examples , 2016, ICLR 2017.

[6]  Jinhui Tang,et al.  Weakly Supervised Deep Matrix Factorization for Social Image Understanding , 2017, IEEE Transactions on Image Processing.

[7]  Hang Su,et al.  Towards Interpretable Deep Neural Networks by Leveraging Adversarial Examples , 2017, ArXiv.

[8]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[9]  Liang Zhao,et al.  Interpreting and Evaluating Neural Network Robustness , 2019, IJCAI.

[10]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Beomsu Kim,et al.  Bridging Adversarial Robustness and Gradient Interpretability , 2019, ArXiv.

[12]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[13]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[14]  John C. Duchi,et al.  Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[15]  Kun He,et al.  Improving the Generalization of Adversarial Training with Domain Adaptation , 2018, ICLR.

[16]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2017, Pattern Recognit..

[17]  Xianglong Liu,et al.  Distributed Adaptive Binary Quantization for Fast Nearest Neighbor Search , 2017, IEEE Transactions on Image Processing.

[18]  Xianglong Liu,et al.  Spatio-temporal deformable 3D ConvNets with attention for action recognition , 2020, Pattern Recognit..

[19]  Bernt Schiele,et al.  Disentangling Adversarial Robustness and Generalization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Chao Li,et al.  Active Transfer Learning Network: A Unified Deep Joint Spectral–Spatial Feature Learning Model for Hyperspectral Image Classification , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[21]  Sourav Sengupta,et al.  Curse of Dimensionality in Adversarial Examples , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[22]  Terrance E. Boult,et al.  Facial Attributes: Accuracy and Adversarial Robustness , 2017, Pattern Recognit. Lett..

[23]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[24]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[25]  Zhanxing Zhu,et al.  Interpreting Adversarially Trained Convolutional Neural Networks , 2019, ICML.

[26]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[27]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Jinhui Tang,et al.  Weakly Supervised Deep Metric Learning for Community-Contributed Image Retrieval , 2015, IEEE Transactions on Multimedia.

[29]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[31]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[32]  Jinhui Tang,et al.  Robust Structured Nonnegative Matrix Factorization for Image Representation , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[33]  Tao Mei,et al.  Deep Collaborative Embedding for Social Image Understanding , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Alan L. Yuille,et al.  Improving Transferability of Adversarial Examples With Input Diversity , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[37]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[38]  Domenec Puig,et al.  Explaining Adversarial Examples by Local Properties of Convolutional Neural Networks , 2017, VISIGRAPP.

[39]  Alexander Wong,et al.  Beyond Explainability: Leveraging Interpretability for Improved Adversarial Learning , 2019, CVPR Workshops.

[40]  Dawn Xiaodong Song,et al.  Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[41]  Mehmed M. Kantardzic,et al.  Data Driven Exploratory Attacks on Black Box Classifiers in Adversarial Domains , 2017, Neurocomputing.

[42]  Matthias Bethge,et al.  Accurate, reliable and fast robustness evaluation , 2019, NeurIPS.

[43]  Lawrence Carin,et al.  Second-Order Adversarial Attack and Certifiable Robustness , 2018, ArXiv.

[44]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[45]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[46]  Haibing Bu,et al.  Unsupervised Adversarial Perturbation Eliminating via Disentangled Representations , 2019, CACRE.

[47]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Michael I. Jordan,et al.  Gradient Descent Can Take Exponential Time to Escape Saddle Points , 2017, NIPS.