论文信息 - Blind Backdoors in Deep Learning Models

Blind Backdoors in Deep Learning Models

We investigate a new method for injecting backdoors into machine learning models, based on poisoning the loss-value computation in the model-training code. We use it to demonstrate new classes of backdoors strictly more powerful than those in prior literature: single-pixel and physical backdoors in ImageNet models, backdoors that switch the model to a covert, privacy-violating task, and backdoors that do not require inference-time input modifications. Our attack is \emph{blind}: the attacker cannot modify the training data, nor observe the execution of his code, nor access the resulting model. Blind backdoor training uses multi-objective optimization to achieve high accuracy on both the main and backdoor tasks. Finally, we show how the blind attack can evade all known defenses, and propose new ones.

Vitaly Shmatikov | Eugene Bagdasaryan

[1] Ankur Srivastava,et al. A Survey on Neural Trojans , 2020, 2020 21st International Symposium on Quality Electronic Design (ISQED).

[2] Tao Liu,et al. SIN2: Stealth infection on neural network — A low-cost agile neural Trojan attack methodology , 2018, 2018 IEEE International Symposium on Hardware Oriented Security and Trust (HOST).

[3] Wenbo Guo,et al. TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems , 2019, ArXiv.

[4] Sudipta Chattopadhyay,et al. Exposing Backdoors in Robust Machine Learning Models , 2020, ArXiv.

[5] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[6] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[7] Jorn-Henrik Jacobsen,et al. Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations , 2020, ICML.

[8] Wen-Chuan Lee,et al. Trojaning Attack on Neural Networks , 2018, NDSS.

[9] Alex Pentland,et al. Face recognition using eigenfaces , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10] Graham Neubig,et al. Weight Poisoning Attacks on Pretrained Models , 2020, ACL.

[11] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[12] Kilian Q. Weinberger,et al. TrojanNet: Embedding Hidden Trojan Horse Models in Neural Networks , 2020, ArXiv.

[13] Ben Y. Zhao,et al. Regula Sub-rosa: Latent Backdoor Attacks on Deep Neural Networks , 2019, ArXiv.

[14] Vitaly Shmatikov,et al. How To Backdoor Federated Learning , 2018, AISTATS.

[15] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[16] Seyed-Mohsen Moosavi-Dezfooli,et al. Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Michael W. Mahoney,et al. Noise-response Analysis for Rapid Detection of Backdoors in Deep Neural Networks , 2020, ArXiv.

[18] Dan Boneh,et al. The Space of Transferable Adversarial Examples , 2017, ArXiv.

[19] Aleksander Madry,et al. Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[20] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[21] Fan Yang,et al. An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks , 2020, KDD.

[22] Bao Gia Doan,et al. Februus: Input Purification Defence Against Trojan Attacks on Deep Neural Network Systems , 2019, 1908.03369.

[23] Ruian Duan,et al. Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages , 2020, NDSS.

[24] Xin Liu,et al. DPATCH: An Adversarial Patch Attack on Object Detectors , 2018, SafeAI@AAAI.

[25] Vladlen Koltun,et al. Multi-Task Learning as Multi-Objective Optimization , 2018, NeurIPS.

[26] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Yukun Yang,et al. Defending Neural Backdoors via Generative Distribution Modeling , 2019, NeurIPS.

[28] Xiangyu Zhang,et al. Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples , 2018, NeurIPS.

[29] H. Pirsiavash,et al. Hidden Trigger Backdoor Attacks , 2019, AAAI.

[30] Ashish Agarwal,et al. TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning , 2019, SysML.

[31] Aleksander Madry,et al. Clean-Label Backdoor Attacks , 2018 .

[32] Ben Y. Zhao,et al. Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[33] Timothy A. Mann,et al. On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models , 2018, ArXiv.

[34] Mark Lee,et al. On Physical Adversarial Patches for Object Detection , 2019, ArXiv.

[35] Tom Goldstein,et al. Certified Defenses for Adversarial Patches , 2020, ICLR.

[36] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[37] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[38] Ananthram Swami,et al. Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[39] Yang Song,et al. Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Tudor Dumitras,et al. On the Effectiveness of Mitigating Data Poisoning Attacks with Gradient Shaping , 2020, ArXiv.

[41] Konrad Rieck,et al. Backdooring and Poisoning Neural Networks with Image-Scaling Attacks , 2020, 2020 IEEE Security and Privacy Workshops (SPW).

[42] Reza Shokri,et al. Bypassing Backdoor Detection Algorithms in Deep Learning , 2019, 2020 IEEE European Symposium on Security and Privacy (EuroS&P).

[43] Fabio Roli,et al. Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[44] Brendan Dolan-Gavitt,et al. BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[45] Amar Phanishayee,et al. Benchmarking and Analyzing Deep Neural Network Training , 2018, 2018 IEEE International Symposium on Workload Characterization (IISWC).

[46] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Justin Hsu,et al. Data Poisoning against Differentially-Private Learners: Attacks and Defenses , 2019, IJCAI.

[48] Benjamin Edwards,et al. Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering , 2018, SafeAI@AAAI.

[49] Damith Chinthana Ranasinghe,et al. STRIP: a defence against trojan attacks on deep neural networks , 2019, ACSAC.

[50] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[51] Jing Ye,et al. Memory Trojan Attack on Neural Network Accelerators , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[52] Yang Zhang,et al. Dynamic Backdoor Attacks Against Machine Learning Models , 2020, 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P).

[53] Sebastian Ruder,et al. An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[54] Jerry Li,et al. Spectral Signatures in Backdoor Attacks , 2018, NeurIPS.

[55] Yevgeniy Vorobeychik,et al. The Tale of Evil Twins: Adversarial Inputs versus Backdoored Models , 2019, ArXiv.

[56] Brendan Dolan-Gavitt,et al. Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks , 2018, RAID.

[57] Xiangyu Zhang,et al. Composite Backdoor Attack for Deep Neural Network by Mixing Existing Benign Features , 2020, CCS.

[58] Dawn Song,et al. Robust Anomaly Detection and Backdoor Attack Detection Via Differential Privacy , 2019, ICLR.

[59] Ankur Srivastava,et al. Neural Trojans , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[60] Ananthram Swami,et al. The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[61] Shanshan Peng,et al. Model Agnostic Defence Against Backdoor Attacks in Machine Learning , 2019, IEEE Transactions on Reliability.

[62] Deliang Fan,et al. TBT: Targeted Neural Network Attack With Bit Trojan , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63] Zhifeng Li,et al. Rethinking the Trigger of Backdoor Attack , 2020, ArXiv.

[64] Florian Tramèr,et al. SentiNet: Detecting Localized Universal Attacks Against Deep Learning Systems , 2018, 2020 IEEE Security and Privacy Workshops (SPW).

[65] Jeremy Howard,et al. fastai: A Layered API for Deep Learning , 2020, Inf..

[66] Aleksander Madry,et al. How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.

[67] Martin Jaggi,et al. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[68] Aditi Raghunathan,et al. Certified Defenses against Adversarial Examples , 2018, ICLR.

[69] Kenneth T. Co,et al. Procedural Noise Adversarial Examples for Black-Box Attacks on Deep Convolutional Networks , 2018, CCS.

[70] Sencun Zhu,et al. Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation , 2018, CODASPY.

[71] Blaine Nelson,et al. Poisoning Attacks against Support Vector Machines , 2012, ICML.

[72] Dawn Xiaodong Song,et al. Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[73] Chengzhi Mao,et al. Live Trojan Attacks on Deep Neural Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[74] Mani Srivastava,et al. NeuronInspect: Detecting Backdoors in Neural Networks via Output Explanations , 2019, ArXiv.

[75] Xiapu Luo,et al. A Tale of Evil Twins: Adversarial Inputs versus Poisoned Models , 2019, CCS.

[76] Damith C. Ranasinghe,et al. DeepCleanse: A Black-box Input SanitizationFramework Against Backdoor Attacks on DeepNeural Networks , 2019 .

[77] Nikita Borisov,et al. Detecting AI Trojans Using Meta Neural Analysis , 2019, 2021 IEEE Symposium on Security and Privacy (SP).

[78] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[79] Samy Bengio,et al. Adversarial examples in the physical world , 2016, ICLR.

[80] J. Désidéri. Multiple-gradient descent algorithm (MGDA) for multiobjective optimization , 2012 .

[81] Paul Barford,et al. Data Poisoning Attacks against Autoregressive Models , 2016, AAAI.

[82] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[83] Dawn Song,et al. Physical Adversarial Examples for Object Detectors , 2018, WOOT @ USENIX Security Symposium.

[84] Jascha Sohl-Dickstein,et al. Adversarial Reprogramming of Neural Networks , 2018, ICLR.

[85] Wen-Zhan Song,et al. PoTrojan: powerful neural-level trojan designs in deep learning models , 2018, ArXiv.

[86] Dawn Xiaodong Song,et al. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[87] Qiang Yang,et al. An Overview of Multi-task Learning , 2018 .

[88] Ning Zhang,et al. Beyond frontal faces: Improving Person Recognition using multiple cues , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[89] Percy Liang,et al. Certified Defenses for Data Poisoning Attacks , 2017, NIPS.

[90] Dan Boneh,et al. Adversarial Training and Robustness for Multiple Perturbations , 2019, NeurIPS.

[91] Xiangyu Zhang,et al. ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation , 2019, CCS.

[92] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[93] Muhammad Shafique,et al. TrISec: Training Data-Unaware Imperceptible Security Attacks on Deep Neural Networks , 2019, 2019 IEEE 25th International Symposium on On-Line Testing and Robust System Design (IOLTS).

[94] Alina Oprea,et al. Subpopulation Data Poisoning Attacks , 2020, CCS.

[95] Martín Abadi,et al. Adversarial Patch , 2017, ArXiv.

[96] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[97] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[98] Chang Liu,et al. Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[99] Emden R. Gansner,et al. Drawing graphs with dot , 2006 .

[100] Ting Wang,et al. Model-Reuse Attacks on Deep Learning Systems , 2018, CCS.

[101] Ramesh Karri,et al. NNoculation: Broad Spectrum and Targeted Treatment of Backdoored DNNs , 2020, ArXiv.

[102] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[103] Dan Boneh,et al. SentiNet: Detecting Physical Attacks Against Deep Learning Systems , 2018, ArXiv.

[104] Haixu Tang,et al. Demon in the Variant: Statistical Analysis of DNNs for Robust Backdoor Contamination Detection , 2019, USENIX Security Symposium.

[105] Cho-Jui Hsieh,et al. Towards Stable and Efficient Training of Verifiably Robust Neural Networks , 2019, ICLR.