论文信息 - Stealthy Backdoors as Compression Artifacts

Stealthy Backdoors as Compression Artifacts

In a backdoor attack on a machine learning model, an adversary produces a model that performs well on normal inputs but outputs targeted misclassifications on inputs containing a small trigger pattern. Model compression is a widelyused approach for reducing the size of deep learning models without much accuracy loss, enabling resource-hungry models to be compressed for use on resource-constrained devices. In this paper, we study the risk that model compression could provide an opportunity for adversaries to inject stealthy backdoors. We design stealthy backdoor attacks such that the fullsized model released by adversaries appears to be free from backdoors (even when tested using state-of-the-art techniques), but when the model is compressed it exhibits highly effective backdoors. We show this can be done for two common model compression techniques—model pruning and model quantization. Our findings demonstrate how an adversary may be able to hide a backdoor as a compression artifact, and show the importance of performing security tests on the models that will actually be deployed not their precompressed version.

[1] Lucas Beyer,et al. Big Transfer (BiT): General Visual Representation Learning , 2020, ECCV.

[2] Yu Chen,et al. Seeing is Not Believing: Camouflage Attacks on Image Scaling Algorithms , 2019, USENIX Security Symposium.

[3] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[4] Dawn Xiaodong Song,et al. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[5] Raghuraman Krishnamoorthi,et al. Quantizing deep convolutional networks for efficient inference: A whitepaper , 2018, ArXiv.

[6] Johannes Stallkamp,et al. Detection of traffic signs in real-world images: The German traffic sign detection benchmark , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[7] Yurong Chen,et al. Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[8] Pushmeet Kohli,et al. PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions , 2015, NIPS.

[9] Xiaocheng Feng,et al. CodeBERT: A Pre-Trained Model for Programming and Natural Languages , 2020, EMNLP.

[10] Ming Yang,et al. Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[11] Xiao Yu,et al. Vessels: efficient and scalable deep learning prediction on trusted processors , 2020, SoCC.

[12] Hamed Haddadi,et al. DarkneTZ: towards model privacy at the edge using trusted execution environments , 2020, MobiSys.

[13] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Vivienne Sze,et al. Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Matt Bishop,et al. Race Conditions, Files, and Security Flaws; or the Tortoise and the Hare Redux , 1995 .

[16] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[17] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[18] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Hao Cheng,et al. Adversarial Robustness vs. Model Compression, or Both? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20] Jungwon Lee,et al. Towards the Limit of Network Quantization , 2016, ICLR.

[21] Chen Lin,et al. Synaptic Strength For Convolutional Neural Network , 2018, NeurIPS.

[22] Damith Chinthana Ranasinghe,et al. STRIP: a defence against trojan attacks on deep neural networks , 2019, ACSAC.

[23] Jian Cheng,et al. Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[25] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[26] Wenbo Guo,et al. TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems , 2019, ArXiv.

[27] Jishen Zhao,et al. DeepInspect: A Black-box Trojan Detection and Mitigation Framework for Deep Neural Networks , 2019, IJCAI.

[28] Song Han,et al. AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[29] Ben Y. Zhao,et al. Latent Backdoor Attacks on Deep Neural Networks , 2019, CCS.

[30] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[31] R. Venkatesh Babu,et al. Data-free Parameter Pruning for Deep Neural Networks , 2015, BMVC.

[32] Xin Dong,et al. Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon , 2017, NIPS.

[33] Ben Y. Zhao,et al. Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[34] Matthijs Douze,et al. Fixing the train-test resolution discrepancy: FixEfficientNet , 2020, ArXiv.

[35] R. P. Abbott,et al. Security Analysis and Enhancements of Computer Operating Systems , 1976 .

[36] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[37] Mathieu Salzmann,et al. Learning the Number of Neurons in Deep Networks , 2016, NIPS.

[38] Xiangyu Zhang,et al. ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation , 2019, CCS.

[39] Michael Backes,et al. Dynamic Backdoor Attacks Against Machine Learning Models , 2020, ArXiv.

[40] Hanan Samet,et al. Pruning Filters for Efficient ConvNets , 2016, ICLR.

[41] Yiran Chen,et al. Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[42] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[43] Mani Srivastava,et al. NeuronInspect: Detecting Backdoors in Neural Networks via Output Explanations , 2019, ArXiv.

[44] Brendan Dolan-Gavitt,et al. BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[45] Fan Yang,et al. An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks , 2020, KDD.

[46] Nikita Borisov,et al. Detecting AI Trojans Using Meta Neural Analysis , 2019, 2021 IEEE Symposium on Security and Privacy (SP).

[47] Bo Chen,et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48] Quoc V. Le,et al. Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[50] Wen-Chuan Lee,et al. Trojaning Attack on Neural Networks , 2018, NDSS.

[51] Ling-Yu Duan,et al. Compression of Deep Neural Networks for Image Instance Retrieval , 2017, 2017 Data Compression Conference (DCC).

[52] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[53] Jieping Ye,et al. AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates , 2020, AAAI.