Mutual Adversarial Training: Learning together is better than going alone

Recent studies have shown that robustness to adversarial attacks can be transferred across networks. In other words, we can make a weak model more robust with the help of a strong teacher model. We ask if instead of learning from a static teacher, can models “learn together” and “teach each other” to achieve better robustness? In this paper, we study how interactions among models affect robustness via knowledge distillation. We propose mutual adversarial training (MAT), in which multiple models are trained together and share the knowledge of adversarial examples to achieve improved robustness. MAT allows robust models to explore a larger space of adversarial samples, and find more robust feature spaces and decision boundaries. Through extensive experiments on CIFAR-10 and CIFAR-100, we demonstrate that MAT can effectively improve model robustness and outperform state-of-the-art methods under white-box attacks, bringing ∼8% accuracy gain to vanilla adversarial training (AT) under PGD-100 attacks. In addition, we show that MAT can also mitigate the robustness trade-off among different perturbation types, bringing as much as 13.1% accuracy gain to AT baselines against the union of l∞, l2 and l1 attacks. These results show the superiority of the proposed method and demonstrate that collaborative learning is an effective strategy for designing robust models.

[1]  Zachary Chase Lipton,et al.  Born Again Neural Networks , 2018, ICML.

[2]  Huchuan Lu,et al.  Deep Mutual Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Sahil Singla,et al.  Perceptual Adversarial Robustness: Defense Against Unseen Threat Models , 2020, ICLR.

[4]  John C. Duchi,et al.  Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[5]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[6]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[7]  Micah Goldblum,et al.  Adversarially Robust Distillation , 2019, AAAI.

[8]  Dan Boneh,et al.  Adversarial Training and Robustness for Multiple Perturbations , 2019, NeurIPS.

[9]  Luiz Eduardo Soares de Oliveira,et al.  Decoupling Direction and Norm for Efficient Gradient-Based L2 Adversarial Attacks and Defenses , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yi Sun,et al.  Testing Robustness Against Unforeseen Adversaries , 2019, ArXiv.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Alexander Levine,et al.  Certifiably Robust Interpretation in Deep Learning , 2019, ArXiv.

[13]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[14]  Jinfeng Yi,et al.  EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples , 2017, AAAI.

[15]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[16]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.

[17]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[19]  Alexander Levine,et al.  Improved, Deterministic Smoothing for L1 Certified Robustness , 2021, ICML.

[20]  Matthias Bethge,et al.  Towards the first adversarially robust neural network model on MNIST , 2018, ICLR.

[21]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[22]  Xiaolin Hu,et al.  Online Knowledge Distillation via Collaborative Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Guocong Song,et al.  Collaborative Learning for Deep Neural Networks , 2018, NeurIPS.

[24]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[25]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[26]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[27]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[28]  W. Brendel,et al.  Foolbox: A Python toolbox to benchmark the robustness of machine learning models , 2017 .

[29]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[30]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[31]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[32]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[33]  Sahil Singla,et al.  Skew Orthogonal Convolutions , 2021, ICML.

[34]  Alexander Levine,et al.  (De)Randomized Smoothing for Certifiable Defense against Patch Attacks , 2020, NeurIPS.

[35]  James Bailey,et al.  Improving Adversarial Robustness Requires Revisiting Misclassified Examples , 2020, ICLR.

[36]  Elahe Arani,et al.  Adversarial Concurrent Training: Optimizing Robustness and Accuracy Trade-off of Deep Neural Networks , 2020, BMVC.

[37]  Wei-An Lin,et al.  Dual Manifold Adversarial Robustness: Defense against Lp and non-Lp Adversarial Attacks , 2020, NeurIPS.

[38]  Dan Boneh,et al.  The Space of Transferable Adversarial Examples , 2017, ArXiv.

[39]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[40]  David A. Wagner,et al.  Defensive Distillation is Not Robust to Adversarial Examples , 2016, ArXiv.

[41]  Soheil Feizi,et al.  Functional Adversarial Attacks , 2019, NeurIPS.

[42]  Kuk-Jin Yoon,et al.  Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  J. Zico Kolter,et al.  Adversarial Robustness Against the Union of Multiple Perturbation Models , 2019, ICML.

[44]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Matthias Bethge,et al.  Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[46]  Shiyu Chang,et al.  Robust Overfitting may be mitigated by properly learned smoothening , 2021, ICLR.