Headless Horseman: Adversarial Attacks on Transfer Learning Models

Transfer learning facilitates the training of task-specific classifiers using pre-trained models as feature extractors. We present a family of transferable adversarial attacks against such classifiers, generated without access to the classification head; we call these headless attacks. We first demonstrate successful transfer attacks against a victim network using only its feature extractor. This motivates the introduction of a label-blind adversarial attack. This transfer attack method does not require any information about the class-label space of the victim. Our attack lowers the accuracy of a ResNet18 trained on CIFAR10 by over 40%.

[1]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Dawn Xiaodong Song,et al.  Exploring the Space of Black-box Attacks on Deep Neural Networks , 2017, ArXiv.

[4]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[7]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Anil K. Jain,et al.  Artificial neural networks for feature extraction and multivariate data projection , 1995, IEEE Trans. Neural Networks.

[9]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[10]  Yoshua Bengio,et al.  Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[11]  Aleksander Madry,et al.  Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors , 2018, ICLR.

[12]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[13]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[14]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[15]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[16]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[17]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[18]  Ben Y. Zhao,et al.  With Great Training Comes Great Vulnerability: Practical Attacks against Transfer Learning , 2018, USENIX Security Symposium.

[19]  Fabio Roli,et al.  Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks , 2018, USENIX Security Symposium.