Efficient Robust Training via Backward Smoothing

Adversarial training is so far the most effective strategy in defending against adversarial examples. However, it suffers from high computational cost due to the iterative adversarial attacks in each training step. Recent studies show that it is possible to achieve Fast Adversarial Training by performing a single-step attack with random initialization. Yet, it remains a mystery why random initialization helps. Besides, such an approach still lags behind state-of-the-art adversarial training algorithms on both stability and model robustness. In this work, we develop a new understanding towards Fast Adversarial Training, by viewing random initialization as performing randomized smoothing for better optimization of the inner maximization problem. From this perspective, we show that the smoothing effect by random initialization is not sufficient under the adversarial perturbation constraint. A new initialization strategy, backward smoothing, is proposed to address this issue and significantly improves both stability and model robustness over single-step robust training methods.Experiments on multiple benchmarks demonstrate that our method achieves similar model robustness as the original TRADES method, while using much less training time ($\sim$3x improvement with the same training schedule).

[1]  Suman Jana,et al.  Towards Understanding Fast Adversarial Training , 2020, ArXiv.

[2]  Hyun Oh Song,et al.  Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization , 2019, ICML.

[3]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[4]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[5]  Matthias Hein,et al.  Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks , 2020, ICML.

[6]  Cho-Jui Hsieh,et al.  Sign-OPT: A Query-Efficient Hard-label Adversarial Attack , 2020, ICLR.

[7]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[8]  Nicolas Flammarion,et al.  Understanding and Improving Fast Adversarial Training , 2020, NeurIPS.

[9]  Martin J. Wainwright,et al.  Randomized Smoothing for Stochastic Optimization , 2011, SIAM J. Optim..

[10]  Leslie N. Smith,et al.  Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[11]  Logan Engstrom,et al.  Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[12]  Larry S. Davis,et al.  Adversarial Training for Free! , 2019, NeurIPS.

[13]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[14]  Po-Sen Huang,et al.  Are Labels Required for Improving Adversarial Robustness? , 2019, NeurIPS.

[15]  James Bailey,et al.  Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[16]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[17]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[18]  Hao Wu,et al.  Mixed Precision Training , 2017, ICLR.

[19]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[20]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[21]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[22]  Jinfeng Yi,et al.  Query-Efficient Hard-label Black-box Attack: An Optimization-based Approach , 2018, ICLR.

[23]  Pushmeet Kohli,et al.  Adversarial Robustness through Local Linearization , 2019, NeurIPS.

[24]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[25]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[26]  Matthias Bethge,et al.  Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[27]  Alan L. Yuille,et al.  Mitigating adversarial effects through randomization , 2017, ICLR.

[28]  Greg Yang,et al.  Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers , 2019, NeurIPS.

[29]  J. Zico Kolter,et al.  Overfitting in adversarially robust deep learning , 2020, ICML.

[30]  Jinfeng Yi,et al.  A Frank-Wolfe Framework for Efficient and Effective Adversarial Attacks , 2018, AAAI.

[31]  Una-May O'Reilly,et al.  Sign Bits Are All You Need for Black-Box Attacks , 2020, ICLR.

[32]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Ludwig Schmidt,et al.  Unlabeled Data Improves Adversarial Robustness , 2019, NeurIPS.

[35]  Jinghui Chen,et al.  RayS: A Ray Searching Method for Hard-label Adversarial Attack , 2020, KDD.

[36]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[37]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[38]  Kimin Lee,et al.  Using Pre-Training Can Improve Model Robustness and Uncertainty , 2019, ICML.

[39]  James Bailey,et al.  On the Convergence and Robustness of Adversarial Training , 2021, ICML.

[40]  Nicolas Flammarion,et al.  Square Attack: a query-efficient black-box adversarial attack via random search , 2020, ECCV.

[41]  Yang Song,et al.  PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[42]  Matthias Hein,et al.  Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack , 2019, ICML.

[43]  Moustapha Cissé,et al.  Countering Adversarial Images using Input Transformations , 2018, ICLR.

[44]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Kamyar Azizzadenesheli,et al.  Stochastic Activation Pruning for Robust Adversarial Defense , 2018, ICLR.

[46]  J. Zico Kolter,et al.  Fast is better than free: Revisiting adversarial training , 2020, ICLR.

[47]  James Bailey,et al.  Improving Adversarial Robustness Requires Revisiting Misclassified Examples , 2020, ICLR.