Efficient Adversarial Training With Transferable Adversarial Examples

Adversarial training is an effective defense method to protect classification models against adversarial attacks. However, one limitation of this approach is that it can require orders of magnitude additional training time due to high cost of generating strong adversarial examples during training. In this paper, we first show that there is high transferability between models from neighboring epochs in the same training process, i.e., adversarial examples from one epoch continue to be adversarial in subsequent epochs. Leveraging this property, we propose a novel method, Adversarial Training with Transferable Adversarial Examples (ATTA), that can enhance the robustness of trained models and greatly improve the training efficiency by accumulating adversarial perturbations through epochs. Compared to state-of-the-art adversarial training methods, ATTA enhances adversarial accuracy by up to 7.2% on CIFAR10 and requires 12~14x less training time on MNIST and CIFAR10 datasets with comparable model robustness.

[1]  Jungwoo Lee,et al.  Generative Adversarial Trainer: Defense to Adversarial Perturbations with GAN , 2017, ArXiv.

[2]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[3]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[4]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[5]  Ian S. Fischer,et al.  Learning to Attack: Adversarial Transformation Networks , 2018, AAAI.

[6]  Song Bai,et al.  Learning Transferable Adversarial Examples via Ghost Networks , 2018, AAAI.

[7]  Dawn Xiaodong Song,et al.  Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[8]  Inderjit S. Dhillon,et al.  The Limitations of Adversarial Training and the Blind-Spot Attack , 2019, ICLR.

[9]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[10]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[13]  Kimin Lee,et al.  Using Pre-Training Can Improve Model Robustness and Uncertainty , 2019, ICML.

[14]  Dawn Song,et al.  Physical Adversarial Examples for Object Detectors , 2018, WOOT @ USENIX Security Symposium.

[15]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.

[17]  Larry S. Davis,et al.  Universal Adversarial Training , 2018, AAAI.

[18]  Dawn Song,et al.  Robust Physical-World Attacks on Deep Learning Models , 2017, 1707.08945.

[19]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[20]  Martin Wistuba,et al.  Adversarial Robustness Toolbox v1.0.0 , 2018, 1807.01069.

[21]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[22]  Bin Dong,et al.  You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle , 2019, NeurIPS.

[23]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[24]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[25]  Ludwig Schmidt,et al.  Unlabeled Data Improves Adversarial Robustness , 2019, NeurIPS.

[26]  Lujo Bauer,et al.  Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition , 2016, CCS.

[27]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[30]  Sai Narasimhamurthy,et al.  Characterizing Deep-Learning I/O Workloads in TensorFlow , 2018, 2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS).

[31]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[32]  Larry S. Davis,et al.  Adversarial Training for Free! , 2019, NeurIPS.

[33]  Dawn Xiaodong Song,et al.  Curriculum Adversarial Training , 2018, IJCAI.

[34]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[35]  Seunghoon Hong,et al.  Adversarial Defense via Learning to Generate Diverse Attacks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[37]  Matthias Hein,et al.  Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation , 2017, NIPS.

[38]  Aleksander Madry,et al.  Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.

[39]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[40]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[41]  Alan L. Yuille,et al.  Improving Transferability of Adversarial Examples With Input Diversity , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Jianyu Wang,et al.  Bilateral Adversarial Training: Towards Fast Training of More Robust Models Against Adversarial Attacks , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).