CARTL: Cooperative Adversarially-Robust Transfer Learning

Transfer learning eases the burden of training a well-performed model from scratch, especially when training data is scarce and computation power is limited. In deep learning, a typical strategy for transfer learning is to freeze the early layers of a pre-trained model and fine-tune the rest of its layers on the target domain. Previous work focuses on the accuracy of the transferred model but neglects the transfer of adversarial robustness. In this work, we first show that transfer learning improves the accuracy on the target domain but degrades the inherited robustness of the target model. To address such a problem, we propose a novel cooperative adversarially-robust transfer learning (CARTL) by pre-training the model via feature distance minimization and fine-tuning the pre-trained model with non-expansive fine-tuning for target domain tasks. Empirical results show that CARTL improves the inherited robustness by about 28% at most compared with the baseline with the same degree of accuracy. Furthermore, we study the relationship between the batch normalization (BN) layers and the robustness in the context of transfer learning, and we reveal that freezing BN layers can further boost the robustness transfer.

[1]  Aleksander Madry,et al.  Adversarially Robust Generalization Requires More Data , 2018, NeurIPS.

[2]  Patrick D. McDaniel,et al.  Extending Defensive Distillation , 2017, ArXiv.

[3]  Rajiv Khanna,et al.  Adversarially-Trained Deep Nets Transfer Better: Illustration on Image Classification , 2021, ICLR.

[4]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[5]  J. Zico Kolter,et al.  Fast is better than free: Revisiting adversarial training , 2020, ICLR.

[6]  Larry S. Davis,et al.  Adversarial Training for Free! , 2019, NeurIPS.

[7]  Moustapha Cissé,et al.  Parseval Networks: Improving Robustness to Adversarial Examples , 2017, ICML.

[8]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[9]  Aleksander Madry,et al.  Robustness May Be at Odds with Accuracy , 2018, ICLR.

[10]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Harini Kannan,et al.  Adversarial Logit Pairing , 2018, NIPS 2018.

[13]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[14]  Ashish Kapoor,et al.  Do Adversarially Robust ImageNet Models Transfer Better? , 2020, NeurIPS.

[15]  Yang Song,et al.  PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[16]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[17]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[18]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[19]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[20]  Haifeng Qian,et al.  L2-Nonexpansive Neural Networks , 2018, ICLR.

[21]  Ben Y. Zhao,et al.  With Great Training Comes Great Vulnerability: Practical Attacks against Transfer Learning , 2018, USENIX Security Symposium.

[22]  Dan Boneh,et al.  Adversarial Training and Robustness for Multiple Perturbations , 2019, NeurIPS.

[23]  Alan L. Yuille,et al.  Intriguing Properties of Adversarial Training at Scale , 2020, ICLR.

[24]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[25]  Yoshua Bengio,et al.  Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[26]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[27]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[28]  David Jacobs,et al.  Adversarially robust transfer learning , 2020, ICLR.

[29]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[30]  Bin Dong,et al.  You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle , 2019, NeurIPS.

[31]  Quoc V. Le,et al.  Adversarial Examples Improve Image Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[33]  Alan L. Yuille,et al.  Feature Denoising for Improving Adversarial Robustness , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[36]  Stefano Ermon,et al.  A DIRT-T Approach to Unsupervised Domain Adaptation , 2018, ICLR.

[37]  Aleksander Madry,et al.  On Adaptive Attacks to Adversarial Example Defenses , 2020, NeurIPS.

[38]  Dandelion Mané,et al.  DEFENSIVE QUANTIZATION: WHEN EFFICIENCY MEETS ROBUSTNESS , 2018 .