论文信息 - Blind Backdoors in Deep Learning Models

Blind Backdoors in Deep Learning Models

We investigate a new method for injecting backdoors into machine learning models, based on poisoning the loss-value computation in the model-training code. We use it to demonstrate new classes of backdoors strictly more powerful than those in prior literature: single-pixel and physical backdoors in ImageNet models, backdoors that switch the model to a covert, privacy-violating task, and backdoors that do not require inference-time input modifications. Our attack is \emph{blind}: the attacker cannot modify the training data, nor observe the execution of his code, nor access the resulting model. Blind backdoor training uses multi-objective optimization to achieve high accuracy on both the main and backdoor tasks. Finally, we show how the blind attack can evade all known defenses, and propose new ones.

Vitaly Shmatikov | Eugene Bagdasaryan | Vitaly Shmatikov | Eugene Bagdasaryan

[1] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[2] Alex Pentland,et al. Face recognition using eigenfaces , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[4] Emden R. Gansner,et al. Drawing graphs with dot , 2006 .

[5] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[6] Blaine Nelson,et al. Poisoning Attacks against Support Vector Machines , 2012, ICML.

[7] J. Désidéri. Multiple-gradient descent algorithm (MGDA) for multiobjective optimization , 2012 .

[8] Martin Jaggi,et al. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[9] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[10] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[12] Ning Zhang,et al. Beyond frontal faces: Improving Person Recognition using multiple cues , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[16] Ananthram Swami,et al. The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[17] Paul Barford,et al. Data Poisoning Attacks against Autoregressive Models , 2016, AAAI.

[18] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[19] Seyed-Mohsen Moosavi-Dezfooli,et al. Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Dan Boneh,et al. The Space of Transferable Adversarial Examples , 2017, ArXiv.

[21] Ananthram Swami,et al. Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[22] Brendan Dolan-Gavitt,et al. BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[23] Sebastian Ruder,et al. An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[24] Ankur Srivastava,et al. Neural Trojans , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[25] Dawn Xiaodong Song,et al. Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[26] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[27] Samy Bengio,et al. Adversarial examples in the physical world , 2016, ICLR.

[28] Dawn Xiaodong Song,et al. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[29] Percy Liang,et al. Certified Defenses for Data Poisoning Attacks , 2017, NIPS.

[30] Martín Abadi,et al. Adversarial Patch , 2017, ArXiv.

[31] Tao Liu,et al. SIN2: Stealth infection on neural network — A low-cost agile neural Trojan attack methodology , 2018, 2018 IEEE International Symposium on Hardware Oriented Security and Trust (HOST).

[32] Wen-Chuan Lee,et al. Trojaning Attack on Neural Networks , 2018, NDSS.

[33] Vladlen Koltun,et al. Multi-Task Learning as Multi-Objective Optimization , 2018, NeurIPS.

[34] Xiangyu Zhang,et al. Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples , 2018, NeurIPS.

[35] Aleksander Madry,et al. Clean-Label Backdoor Attacks , 2018 .

[36] Timothy A. Mann,et al. On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models , 2018, ArXiv.

[37] Fabio Roli,et al. Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[38] Amar Phanishayee,et al. Benchmarking and Analyzing Deep Neural Network Training , 2018, 2018 IEEE International Symposium on Workload Characterization (IISWC).

[39] Jerry Li,et al. Spectral Signatures in Backdoor Attacks , 2018, NeurIPS.

[40] Brendan Dolan-Gavitt,et al. Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks , 2018, RAID.

[41] Aleksander Madry,et al. How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.

[42] Aditi Raghunathan,et al. Certified Defenses against Adversarial Examples , 2018, ICLR.

[43] Dawn Song,et al. Physical Adversarial Examples for Object Detectors , 2018, WOOT @ USENIX Security Symposium.

[44] Wen-Zhan Song,et al. PoTrojan: powerful neural-level trojan designs in deep learning models , 2018, ArXiv.

[45] Qiang Yang,et al. An Overview of Multi-task Learning , 2018 .

[46] Chang Liu,et al. Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[47] Ting Wang,et al. Model-Reuse Attacks on Deep Learning Systems , 2018, CCS.

[48] Dan Boneh,et al. SentiNet: Detecting Physical Attacks Against Deep Learning Systems , 2018, ArXiv.

[49] Wenbo Guo,et al. TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems , 2019, ArXiv.

[50] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[51] Aleksander Madry,et al. Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[52] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[53] Xin Liu,et al. DPATCH: An Adversarial Patch Attack on Object Detectors , 2018, SafeAI@AAAI.

[54] Yukun Yang,et al. Defending Neural Backdoors via Generative Distribution Modeling , 2019, NeurIPS.

[55] Ashish Agarwal,et al. TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning , 2019, SysML.

[56] Ben Y. Zhao,et al. Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[57] Mark Lee,et al. On Physical Adversarial Patches for Object Detection , 2019, ArXiv.

[58] Yang Song,et al. Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59] Justin Hsu,et al. Data Poisoning against Differentially-Private Learners: Attacks and Defenses , 2019, IJCAI.

[60] Benjamin Edwards,et al. Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering , 2018, SafeAI@AAAI.

[61] Damith Chinthana Ranasinghe,et al. STRIP: a defence against trojan attacks on deep neural networks , 2019, ACSAC.

[62] Jing Ye,et al. Memory Trojan Attack on Neural Network Accelerators , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[63] Yevgeniy Vorobeychik,et al. The Tale of Evil Twins: Adversarial Inputs versus Backdoored Models , 2019, ArXiv.

[64] Shanshan Peng,et al. Model Agnostic Defence Against Backdoor Attacks in Machine Learning , 2019, IEEE Transactions on Reliability.

[65] Kenneth T. Co,et al. Procedural Noise Adversarial Examples for Black-Box Attacks on Deep Convolutional Networks , 2018, CCS.

[66] Mani Srivastava,et al. NeuronInspect: Detecting Backdoors in Neural Networks via Output Explanations , 2019, ArXiv.

[67] Damith C. Ranasinghe,et al. DeepCleanse: A Black-box Input SanitizationFramework Against Backdoor Attacks on DeepNeural Networks , 2019 .

[68] Ben Y. Zhao,et al. Regula Sub-rosa: Latent Backdoor Attacks on Deep Neural Networks , 2019, ArXiv.

[69] Jascha Sohl-Dickstein,et al. Adversarial Reprogramming of Neural Networks , 2018, ICLR.

[70] Dan Boneh,et al. Adversarial Training and Robustness for Multiple Perturbations , 2019, NeurIPS.

[71] Xiangyu Zhang,et al. ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation , 2019, CCS.

[72] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[73] Muhammad Shafique,et al. TrISec: Training Data-Unaware Imperceptible Security Attacks on Deep Neural Networks , 2019, 2019 IEEE 25th International Symposium on On-Line Testing and Robust System Design (IOLTS).

[74] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[75] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[76] Ankur Srivastava,et al. A Survey on Neural Trojans , 2020, 2020 21st International Symposium on Quality Electronic Design (ISQED).

[77] Sudipta Chattopadhyay,et al. Exposing Backdoors in Robust Machine Learning Models , 2020, ArXiv.

[78] Jorn-Henrik Jacobsen,et al. Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations , 2020, ICML.

[79] Graham Neubig,et al. Weight Poisoning Attacks on Pretrained Models , 2020, ACL.

[80] Kilian Q. Weinberger,et al. TrojanNet: Embedding Hidden Trojan Horse Models in Neural Networks , 2020, ArXiv.

[81] Vitaly Shmatikov,et al. How To Backdoor Federated Learning , 2018, AISTATS.

[82] Michael W. Mahoney,et al. Noise-response Analysis for Rapid Detection of Backdoors in Deep Neural Networks , 2020, ArXiv.

[83] Fan Yang,et al. An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks , 2020, KDD.

[84] Bao Gia Doan,et al. Februus: Input Purification Defence Against Trojan Attacks on Deep Neural Network Systems , 2019, 1908.03369.

[85] H. Pirsiavash,et al. Hidden Trigger Backdoor Attacks , 2019, AAAI.

[86] Tom Goldstein,et al. Certified Defenses for Adversarial Patches , 2020, ICLR.

[87] Tudor Dumitras,et al. On the Effectiveness of Mitigating Data Poisoning Attacks with Gradient Shaping , 2020, ArXiv.

[88] Konrad Rieck,et al. Backdooring and Poisoning Neural Networks with Image-Scaling Attacks , 2020, 2020 IEEE Security and Privacy Workshops (SPW).

[89] Reza Shokri,et al. Bypassing Backdoor Detection Algorithms in Deep Learning , 2019, 2020 IEEE European Symposium on Security and Privacy (EuroS&P).

[90] Yang Zhang,et al. Dynamic Backdoor Attacks Against Machine Learning Models , 2020, 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P).

[91] Xiangyu Zhang,et al. Composite Backdoor Attack for Deep Neural Network by Mixing Existing Benign Features , 2020, CCS.

[92] Dawn Song,et al. Robust Anomaly Detection and Backdoor Attack Detection Via Differential Privacy , 2019, ICLR.

[93] Zhifeng Li,et al. Rethinking the Trigger of Backdoor Attack , 2020, ArXiv.

[94] Florian Tramèr,et al. SentiNet: Detecting Localized Universal Attacks Against Deep Learning Systems , 2018, 2020 IEEE Security and Privacy Workshops (SPW).

[95] Jeremy Howard,et al. fastai: A Layered API for Deep Learning , 2020, Inf..

[96] Sencun Zhu,et al. Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation , 2018, CODASPY.

[97] Chengzhi Mao,et al. Live Trojan Attacks on Deep Neural Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[98] Xiapu Luo,et al. A Tale of Evil Twins: Adversarial Inputs versus Poisoned Models , 2019, CCS.

[99] Deliang Fan,et al. TBT: Targeted Neural Network Attack With Bit Trojan , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[100] Ramesh Karri,et al. NNoculation: Broad Spectrum and Targeted Treatment of Backdoored DNNs , 2020, ArXiv.

[101] Cho-Jui Hsieh,et al. Towards Stable and Efficient Training of Verifiably Robust Neural Networks , 2019, ICLR.

[102] Ruian Duan,et al. Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages , 2020, NDSS.

[103] Nikita Borisov,et al. Detecting AI Trojans Using Meta Neural Analysis , 2019, 2021 IEEE Symposium on Security and Privacy (SP).

[104] Alina Oprea,et al. Subpopulation Data Poisoning Attacks , 2020, CCS.

[105] Haixu Tang,et al. Demon in the Variant: Statistical Analysis of DNNs for Robust Backdoor Contamination Detection , 2019, USENIX Security Symposium.