论文信息 - Bilateral Adversarial Training: Towards Fast Training of More Robust Models Against Adversarial Attacks

Bilateral Adversarial Training: Towards Fast Training of More Robust Models Against Adversarial Attacks

In this paper, we study fast training of adversarially robust models. From the analyses of the state-of-the-art defense method, i.e., the multi-step adversarial training~\cite{madry2017towards}, we hypothesize that the gradient magnitude links to the model robustness. Motivated by this, we propose to perturb both the image and the label during training, which we call Bilateral Adversarial Training (BAT). To generate the adversarial label, we derive an closed-form heuristic solution. To generate the adversarial image, we use one-step targeted attack with the target label being the most confusing class. In the experiment, we first show that random start and the most confusing target attack effectively prevent the label leaking and gradient masking problem. Then coupled with the adversarial label part, our model significantly improves the state-of-the-art results. For example, against PGD100 white-box attack with cross-entropy loss, on CIFAR10, we achieve 63.7\% versus 47.2\%; on SVHN, we achieve 59.1\% versus 42.1\%. At last, the experiment on the very (computationally) challenging ImageNet dataset further demonstrates the effectiveness of our fast method.

Jianyu Wang | Jianyu Wang

[1] Moustapha Cissé,et al. Countering Adversarial Images using Input Transformations , 2018, ICLR.

[2] Yang Song,et al. PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[3] Seyed-Mohsen Moosavi-Dezfooli,et al. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Fabio Roli,et al. Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[5] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .

[6] Samy Bengio,et al. Adversarial Machine Learning at Scale , 2016, ICLR.

[7] David Wagner,et al. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[8] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] David A. Wagner,et al. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[10] Dawn Song,et al. Robust Physical-World Attacks on Deep Learning Models , 2017, 1707.08945.

[11] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[12] Jinfeng Yi,et al. ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[13] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[14] John C. Duchi,et al. Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[15] Aleksander Madry,et al. Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.

[16] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[17] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[18] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[19] Jinfeng Yi,et al. EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples , 2017, AAAI.

[20] Andrew Slavin Ross,et al. Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients , 2017, AAAI.

[21] Fabio Roli,et al. Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2017, Pattern Recognit..

[22] Raja Giryes,et al. Improving DNN Robustness to Adversarial Attacks using Jacobian Regularization , 2018, ECCV.

[23] Jan Hendrik Metzen,et al. On Detecting Adversarial Perturbations , 2017, ICLR.

[24] Moustapha Cissé,et al. Houdini: Fooling Deep Structured Prediction Models , 2017, ArXiv.

[25] Thomas Brox,et al. Universal Adversarial Perturbations Against Semantic Image Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26] Jun Zhu,et al. Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27] Aditi Raghunathan,et al. Certified Defenses against Adversarial Examples , 2018, ICLR.

[28] Michael I. Jordan,et al. Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[29] James A. Storer,et al. Deflecting Adversarial Attacks with Pixel Deflection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30] Fabio Roli,et al. Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[31] Harini Kannan,et al. Adversarial Logit Pairing , 2018, NIPS 2018.

[32] Abhimanyu Dubey,et al. Defense Against Adversarial Images Using Web-Scale Nearest-Neighbor Search , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Cho-Jui Hsieh,et al. Towards Robust Neural Networks via Random Self-ensemble , 2017, ECCV.

[34] Alan L. Yuille,et al. Adversarial Examples for Semantic Segmentation and Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35] J. Doug Tygar,et al. Adversarial machine learning , 2019, AISec '11.

[36] Sandy H. Huang,et al. Adversarial Attacks on Neural Network Policies , 2017, ICLR.

[37] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.

[38] Ananthram Swami,et al. Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[39] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[40] Ananthram Swami,et al. Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[41] Alan L. Yuille,et al. Mitigating adversarial effects through randomization , 2017, ICLR.

[42] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[43] Jinfeng Yi,et al. Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning , 2017, ACL.

[44] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[45] Abhinav Gupta,et al. Robust Adversarial Reinforcement Learning , 2017, ICML.

[46] Dan Boneh,et al. Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[47] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.

[48] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[49] Pushmeet Kohli,et al. Adversarial Risk and the Dangers of Evaluating Against Weak Attacks , 2018, ICML.

[50] Rama Chellappa,et al. Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[51] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[52] Logan Engstrom,et al. Synthesizing Robust Adversarial Examples , 2017, ICML.

[53] Hao Chen,et al. MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[54] Dawn Xiaodong Song,et al. Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[55] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[56] Alan L. Yuille,et al. Feature Denoising for Improving Adversarial Robustness , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57] Samy Bengio,et al. Adversarial examples in the physical world , 2016, ICLR.

[58] Matthias Bethge,et al. Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[59] Aleksander Madry,et al. A Rotation and a Translation Suffice: Fooling CNNs with Simple Transformations , 2017, ArXiv.

[60] Pedro M. Domingos,et al. Adversarial classification , 2004, KDD.

[61] Xiaolin Hu,et al. Defense Against Adversarial Attacks Using High-Level Representation Guided Denoiser , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[62] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.