Attacks Which Do Not Kill Training Make Adversarial Learning Stronger

Adversarial training based on the minimax formulation is necessary for obtaining adversarial robustness of trained models. However, it is conservative or even pessimistic so that it sometimes hurts the natural generalization. In this paper, we raise a fundamental question---do we have to trade off natural generalization for adversarial robustness? We argue that adversarial training is to employ confident adversarial data for updating the current model. We propose a novel approach of friendly adversarial training (FAT): rather than employing most adversarial data maximizing the loss, we search for least adversarial (i.e., friendly adversarial) data minimizing the loss, among the adversarial data that are confidently misclassified. Our novel formulation is easy to implement by just stopping the most adversarial data searching algorithms such as PGD (projected gradient descent) early, which we call early-stopped PGD. Theoretically, FAT is justified by an upper bound of the adversarial risk. Empirically, early-stopped PGD allows us to answer the earlier question negatively---adversarial robustness can indeed be achieved without compromising the natural generalization.

[1]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[2]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[3]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[4]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[5]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[6]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[7]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[8]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[9]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Shin Ishii,et al.  Distributional Smoothing by Virtual Adversarial Examples , 2015, ICLR.

[12]  Moustapha Cissé,et al.  Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[13]  Matthias Hein,et al.  Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation , 2017, NIPS.

[14]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[15]  Masashi Sugiyama,et al.  Lipschitz-Margin Training: Scalable Certification of Perturbation Invariance for Deep Neural Networks , 2018, NeurIPS.

[16]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[17]  Jinfeng Yi,et al.  Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach , 2018, ICLR.

[18]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[19]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[20]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[21]  Dawn Xiaodong Song,et al.  Curriculum Adversarial Training , 2018, IJCAI.

[22]  Mingyan Liu,et al.  Spatially Transformed Adversarial Examples , 2018, ICLR.

[23]  M. Maruthappu,et al.  Artificial intelligence in medicine: current trends and future possibilities. , 2018, The British journal of general practice : the journal of the Royal College of General Practitioners.

[24]  Changshui Zhang,et al.  Deep Defense: Training DNNs with Improved Adversarial Robustness , 2018, NeurIPS.

[25]  Kannan Ramchandran,et al.  Rademacher Complexity for Adversarially Robust Generalization , 2018, ICML.

[26]  Adi Shamir,et al.  A Simple Explanation for the Existence of Adversarial Examples with Small Hamming Distance , 2019, ArXiv.

[27]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[28]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[29]  David Tse,et al.  Generalizable Adversarial Training via Spectral Normalization , 2018, ICLR.

[30]  Suman Jana,et al.  Certified Robustness to Adversarial Examples with Differential Privacy , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[31]  Giovanni S. Alberti,et al.  ADef: an Iterative Algorithm to Construct Adversarial Deformations , 2018, ICLR.

[32]  Amir Najafi,et al.  Robustness to Adversarial Perturbations in Learning from Incomplete Data , 2019, NeurIPS.

[33]  Ning Chen,et al.  Improving Adversarial Robustness via Promoting Ensemble Diversity , 2019, ICML.

[34]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[35]  Po-Sen Huang,et al.  Are Labels Required for Improving Adversarial Robustness? , 2019, NeurIPS.

[36]  Preetum Nakkiran,et al.  Adversarial Robustness May Be at Odds With Simplicity , 2019, ArXiv.

[37]  Ludwig Schmidt,et al.  Unlabeled Data Improves Adversarial Robustness , 2019, NeurIPS.

[38]  Larry S. Davis,et al.  Adversarial Training for Free! , 2019, NeurIPS.

[39]  Bin Dong,et al.  You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle , 2019, NeurIPS.

[40]  James Bailey,et al.  On the Convergence and Robustness of Adversarial Training , 2021, ICML.

[41]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[42]  J. Z. Kolter,et al.  Overfitting in adversarially robust deep learning , 2020, ICML.

[43]  John Duchi,et al.  Understanding and Mitigating the Tradeoff Between Robustness and Accuracy , 2020, ICML.

[44]  Supriyo Chakraborty,et al.  Improving Adversarial Robustness Through Progressive Hardening , 2020, ArXiv.

[45]  J. Zico Kolter,et al.  Fast is better than free: Revisiting adversarial training , 2020, ICLR.

[46]  Mislav Balunovic,et al.  Adversarial Training and Provable Defenses: Bridging the Gap , 2020, ICLR.

[47]  James Bailey,et al.  Improving Adversarial Robustness Requires Revisiting Misclassified Examples , 2020, ICLR.

[48]  R. Venkatesh Babu,et al.  Single-Step Adversarial Training With Dropout Scheduling , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Cho-Jui Hsieh,et al.  Towards Stable and Efficient Training of Verifiably Robust Neural Networks , 2019, ICLR.