Blind Backdoors in Deep Learning Models

We investigate a new method for injecting backdoors into machine learning models, based on poisoning the loss-value computation in the model-training code. We use it to demonstrate new classes of backdoors strictly more powerful than those in prior literature: single-pixel and physical backdoors in ImageNet models, backdoors that switch the model to a covert, privacy-violating task, and backdoors that do not require inference-time input modifications. Our attack is \emph{blind}: the attacker cannot modify the training data, nor observe the execution of his code, nor access the resulting model. Blind backdoor training uses multi-objective optimization to achieve high accuracy on both the main and backdoor tasks. Finally, we show how the blind attack can evade all known defenses, and propose new ones.

[1]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[2]  Alex Pentland,et al.  Face recognition using eigenfaces , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[4]  Emden R. Gansner,et al.  Drawing graphs with dot , 2006 .

[5]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[6]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[7]  J. Désidéri Multiple-gradient descent algorithm (MGDA) for multiobjective optimization , 2012 .

[8]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[9]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[10]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[12]  Ning Zhang,et al.  Beyond frontal faces: Improving Person Recognition using multiple cues , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[16]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[17]  Paul Barford,et al.  Data Poisoning Attacks against Autoregressive Models , 2016, AAAI.

[18]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[19]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Dan Boneh,et al.  The Space of Transferable Adversarial Examples , 2017, ArXiv.

[21]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[22]  Brendan Dolan-Gavitt,et al.  BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[23]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[24]  Ankur Srivastava,et al.  Neural Trojans , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[25]  Dawn Xiaodong Song,et al.  Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[26]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[27]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[28]  Dawn Xiaodong Song,et al.  Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[29]  Percy Liang,et al.  Certified Defenses for Data Poisoning Attacks , 2017, NIPS.

[30]  Martín Abadi,et al.  Adversarial Patch , 2017, ArXiv.

[31]  Tao Liu,et al.  SIN2: Stealth infection on neural network — A low-cost agile neural Trojan attack methodology , 2018, 2018 IEEE International Symposium on Hardware Oriented Security and Trust (HOST).

[32]  Wen-Chuan Lee,et al.  Trojaning Attack on Neural Networks , 2018, NDSS.

[33]  Vladlen Koltun,et al.  Multi-Task Learning as Multi-Objective Optimization , 2018, NeurIPS.

[34]  Xiangyu Zhang,et al.  Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples , 2018, NeurIPS.

[35]  Aleksander Madry,et al.  Clean-Label Backdoor Attacks , 2018 .

[36]  Timothy A. Mann,et al.  On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models , 2018, ArXiv.

[37]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[38]  Amar Phanishayee,et al.  Benchmarking and Analyzing Deep Neural Network Training , 2018, 2018 IEEE International Symposium on Workload Characterization (IISWC).

[39]  Jerry Li,et al.  Spectral Signatures in Backdoor Attacks , 2018, NeurIPS.

[40]  Brendan Dolan-Gavitt,et al.  Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks , 2018, RAID.

[41]  Aleksander Madry,et al.  How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.

[42]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.

[43]  Dawn Song,et al.  Physical Adversarial Examples for Object Detectors , 2018, WOOT @ USENIX Security Symposium.

[44]  Wen-Zhan Song,et al.  PoTrojan: powerful neural-level trojan designs in deep learning models , 2018, ArXiv.

[45]  Qiang Yang,et al.  An Overview of Multi-task Learning , 2018 .

[46]  Chang Liu,et al.  Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[47]  Ting Wang,et al.  Model-Reuse Attacks on Deep Learning Systems , 2018, CCS.

[48]  Dan Boneh,et al.  SentiNet: Detecting Physical Attacks Against Deep Learning Systems , 2018, ArXiv.

[49]  Wenbo Guo,et al.  TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems , 2019, ArXiv.

[50]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[51]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[52]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[53]  Xin Liu,et al.  DPATCH: An Adversarial Patch Attack on Object Detectors , 2018, SafeAI@AAAI.

[54]  Yukun Yang,et al.  Defending Neural Backdoors via Generative Distribution Modeling , 2019, NeurIPS.

[55]  Ashish Agarwal,et al.  TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning , 2019, SysML.

[56]  Ben Y. Zhao,et al.  Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[57]  Mark Lee,et al.  On Physical Adversarial Patches for Object Detection , 2019, ArXiv.

[58]  Yang Song,et al.  Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Justin Hsu,et al.  Data Poisoning against Differentially-Private Learners: Attacks and Defenses , 2019, IJCAI.

[60]  Benjamin Edwards,et al.  Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering , 2018, SafeAI@AAAI.

[61]  Damith Chinthana Ranasinghe,et al.  STRIP: a defence against trojan attacks on deep neural networks , 2019, ACSAC.

[62]  Jing Ye,et al.  Memory Trojan Attack on Neural Network Accelerators , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[63]  Yevgeniy Vorobeychik,et al.  The Tale of Evil Twins: Adversarial Inputs versus Backdoored Models , 2019, ArXiv.

[64]  Shanshan Peng,et al.  Model Agnostic Defence Against Backdoor Attacks in Machine Learning , 2019, IEEE Transactions on Reliability.

[65]  Kenneth T. Co,et al.  Procedural Noise Adversarial Examples for Black-Box Attacks on Deep Convolutional Networks , 2018, CCS.

[66]  Mani Srivastava,et al.  NeuronInspect: Detecting Backdoors in Neural Networks via Output Explanations , 2019, ArXiv.

[67]  Damith C. Ranasinghe,et al.  DeepCleanse: A Black-box Input SanitizationFramework Against Backdoor Attacks on DeepNeural Networks , 2019 .

[68]  Ben Y. Zhao,et al.  Regula Sub-rosa: Latent Backdoor Attacks on Deep Neural Networks , 2019, ArXiv.

[69]  Jascha Sohl-Dickstein,et al.  Adversarial Reprogramming of Neural Networks , 2018, ICLR.

[70]  Dan Boneh,et al.  Adversarial Training and Robustness for Multiple Perturbations , 2019, NeurIPS.

[71]  Xiangyu Zhang,et al.  ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation , 2019, CCS.

[72]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[73]  Muhammad Shafique,et al.  TrISec: Training Data-Unaware Imperceptible Security Attacks on Deep Neural Networks , 2019, 2019 IEEE 25th International Symposium on On-Line Testing and Robust System Design (IOLTS).

[74]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[75]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[76]  Ankur Srivastava,et al.  A Survey on Neural Trojans , 2020, 2020 21st International Symposium on Quality Electronic Design (ISQED).

[77]  Sudipta Chattopadhyay,et al.  Exposing Backdoors in Robust Machine Learning Models , 2020, ArXiv.

[78]  Jorn-Henrik Jacobsen,et al.  Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations , 2020, ICML.

[79]  Graham Neubig,et al.  Weight Poisoning Attacks on Pretrained Models , 2020, ACL.

[80]  Kilian Q. Weinberger,et al.  TrojanNet: Embedding Hidden Trojan Horse Models in Neural Networks , 2020, ArXiv.

[81]  Vitaly Shmatikov,et al.  How To Backdoor Federated Learning , 2018, AISTATS.

[82]  Michael W. Mahoney,et al.  Noise-response Analysis for Rapid Detection of Backdoors in Deep Neural Networks , 2020, ArXiv.

[83]  Fan Yang,et al.  An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks , 2020, KDD.

[84]  Bao Gia Doan,et al.  Februus: Input Purification Defence Against Trojan Attacks on Deep Neural Network Systems , 2019, 1908.03369.

[85]  H. Pirsiavash,et al.  Hidden Trigger Backdoor Attacks , 2019, AAAI.

[86]  Tom Goldstein,et al.  Certified Defenses for Adversarial Patches , 2020, ICLR.

[87]  Tudor Dumitras,et al.  On the Effectiveness of Mitigating Data Poisoning Attacks with Gradient Shaping , 2020, ArXiv.

[88]  Konrad Rieck,et al.  Backdooring and Poisoning Neural Networks with Image-Scaling Attacks , 2020, 2020 IEEE Security and Privacy Workshops (SPW).

[89]  Reza Shokri,et al.  Bypassing Backdoor Detection Algorithms in Deep Learning , 2019, 2020 IEEE European Symposium on Security and Privacy (EuroS&P).

[90]  Yang Zhang,et al.  Dynamic Backdoor Attacks Against Machine Learning Models , 2020, 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P).

[91]  Xiangyu Zhang,et al.  Composite Backdoor Attack for Deep Neural Network by Mixing Existing Benign Features , 2020, CCS.

[92]  Dawn Song,et al.  Robust Anomaly Detection and Backdoor Attack Detection Via Differential Privacy , 2019, ICLR.

[93]  Zhifeng Li,et al.  Rethinking the Trigger of Backdoor Attack , 2020, ArXiv.

[94]  Florian Tramèr,et al.  SentiNet: Detecting Localized Universal Attacks Against Deep Learning Systems , 2018, 2020 IEEE Security and Privacy Workshops (SPW).

[95]  Jeremy Howard,et al.  fastai: A Layered API for Deep Learning , 2020, Inf..

[96]  Sencun Zhu,et al.  Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation , 2018, CODASPY.

[97]  Chengzhi Mao,et al.  Live Trojan Attacks on Deep Neural Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[98]  Xiapu Luo,et al.  A Tale of Evil Twins: Adversarial Inputs versus Poisoned Models , 2019, CCS.

[99]  Deliang Fan,et al.  TBT: Targeted Neural Network Attack With Bit Trojan , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[100]  Ramesh Karri,et al.  NNoculation: Broad Spectrum and Targeted Treatment of Backdoored DNNs , 2020, ArXiv.

[101]  Cho-Jui Hsieh,et al.  Towards Stable and Efficient Training of Verifiably Robust Neural Networks , 2019, ICLR.

[102]  Ruian Duan,et al.  Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages , 2020, NDSS.

[103]  Nikita Borisov,et al.  Detecting AI Trojans Using Meta Neural Analysis , 2019, 2021 IEEE Symposium on Security and Privacy (SP).

[104]  Alina Oprea,et al.  Subpopulation Data Poisoning Attacks , 2020, CCS.

[105]  Haixu Tang,et al.  Demon in the Variant: Statistical Analysis of DNNs for Robust Backdoor Contamination Detection , 2019, USENIX Security Symposium.