论文信息 - Adversarial Defense via Learning to Generate Diverse Attacks

Adversarial Defense via Learning to Generate Diverse Attacks

With the remarkable success of deep learning, Deep Neural Networks (DNNs) have been applied as dominant tools to various machine learning domains. Despite this success, however, it has been found that DNNs are surprisingly vulnerable to malicious attacks; adding a small, perceptually indistinguishable perturbations to the data can easily degrade classification performance. Adversarial training is an effective defense strategy to train a robust classifier. In this work, we propose to utilize the generator to learn how to create adversarial examples. Unlike the existing approaches that create a one-shot perturbation by a deterministic generator, we propose a recursive and stochastic generator that produces much stronger and diverse perturbations that comprehensively reveal the vulnerability of the target classifier. Our experiment results on MNIST and CIFAR-10 datasets show that the classifier adversarially trained with our method yields more robust performance over various white-box and black-box attacks.

[1] Yang Song,et al. PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[2] Moustapha Cissé,et al. Countering Adversarial Images using Input Transformations , 2018, ICLR.

[3] Seyed-Mohsen Moosavi-Dezfooli,et al. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Alan L. Yuille,et al. Feature Denoising for Improving Adversarial Robustness , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Kevin Gimpel,et al. Early Methods for Detecting Adversarial Images , 2016, ICLR.

[6] Kamyar Azizzadenesheli,et al. Stochastic Activation Pruning for Robust Adversarial Defense , 2018, ICLR.

[7] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8] Patrick D. McDaniel,et al. On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.

[9] Tianqi Chen,et al. Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[10] Hugo Larochelle,et al. Modulating early visual processing by language , 2017, NIPS.

[11] David A. Wagner,et al. MagNet and "Efficient Defenses Against Adversarial Attacks" are Not Robust to Adversarial Examples , 2017, ArXiv.

[12] Antonio Torralba,et al. Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[13] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[15] Jonathon Shlens,et al. A Learned Representation For Artistic Style , 2016, ICLR.

[16] Jihun Hamm. Machine vs Machine: Defending Classifiers Against Learning-based Adversarial Attacks , 2017, ArXiv.

[17] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[18] Jungwoo Lee,et al. Generative Adversarial Trainer: Defense to Adversarial Perturbations with GAN , 2017, ArXiv.

[19] Ryan R. Curtin,et al. Detecting Adversarial Samples from Artifacts , 2017, ArXiv.

[20] Timo Aila,et al. A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[22] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23] Xin Li,et al. Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[24] R. Venkatesh Babu,et al. NAG: Network for Adversary Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25] Samy Bengio,et al. Adversarial Machine Learning at Scale , 2016, ICLR.

[26] David Wagner,et al. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[27] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[28] Koray Kavukcuoglu,et al. Pixel Recurrent Neural Networks , 2016, ICML.

[29] Matthias Hein,et al. Logit Pairing Methods Can Fool Gradient-Based Attacks , 2018, ArXiv.

[30] Ananthram Swami,et al. The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[31] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[32] John C. Duchi,et al. Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[33] Guneet Singh Dhillon,et al. TOCHASTIC ACTIVATION PRUNING FOR ROBUST ADVERSARIAL DEFENSE , 2018 .

[34] Ananthram Swami,et al. Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[35] Alan L. Yuille,et al. Mitigating adversarial effects through randomization , 2017, ICLR.

[36] Chun-Nam Yu,et al. A Direct Approach to Robust Deep Learning Using Adversarial Networks , 2019, ICLR.

[37] Saibal Mukhopadhyay,et al. Cascade Adversarial Machine Learning Regularized with a Unified Embedding , 2017, ICLR.

[38] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Isay Katsman,et al. Generative Adversarial Perturbations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40] Jun Zhu,et al. Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41] Zhitao Gong,et al. Adversarial and Clean Data Are Not Twins , 2017, aiDM@SIGMOD.

[42] Jan Hendrik Metzen,et al. On Detecting Adversarial Perturbations , 2017, ICLR.

[43] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[44] Ian S. Fischer,et al. Adversarial Transformation Networks: Learning to Generate Adversarial Examples , 2017, ArXiv.

[45] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.

[46] Alexei A. Efros,et al. Toward Multimodal Image-to-Image Translation , 2017, NIPS.

[47] Rama Chellappa,et al. Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[48] David A. Wagner,et al. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[49] Bo Dai,et al. Learning to Defense by Learning to Attack , 2018, DGS@ICLR.

[50] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[51] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[52] Patrick D. McDaniel,et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[53] Yizheng Chen,et al. MixTrain: Scalable Training of Formally Robust Neural Networks , 2018, ArXiv.

[54] Hao Li,et al. Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.

[55] Léon Bottou,et al. Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[56] Mingyan Liu,et al. Generating Adversarial Examples with Adversarial Networks , 2018, IJCAI.

[57] Logan Engstrom,et al. Evaluating and Understanding the Robustness of Adversarial Logit Pairing , 2018, ArXiv.

[58] Seunghoon Hong,et al. Diversity-Sensitive Conditional Generative Adversarial Networks , 2019, ICLR.

[59] Hao Chen,et al. MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[60] Dawn Xiaodong Song,et al. Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[61] Valentina Zantedeschi,et al. Efficient Defenses Against Adversarial Attacks , 2017, AISec@CCS.

[62] Samy Bengio,et al. Adversarial examples in the physical world , 2016, ICLR.