Blind Backdoors in Deep Learning Models

We investigate a new method for injecting backdoors into machine learning models, based on poisoning the loss-value computation in the model-training code. We use it to demonstrate new classes of backdoors strictly more powerful than those in prior literature: single-pixel and physical backdoors in ImageNet models, backdoors that switch the model to a covert, privacy-violating task, and backdoors that do not require inference-time input modifications. Our attack is \emph{blind}: the attacker cannot modify the training data, nor observe the execution of his code, nor access the resulting model. Blind backdoor training uses multi-objective optimization to achieve high accuracy on both the main and backdoor tasks. Finally, we show how the blind attack can evade all known defenses, and propose new ones.

[1]  Ankur Srivastava,et al.  A Survey on Neural Trojans , 2020, 2020 21st International Symposium on Quality Electronic Design (ISQED).

[2]  Tao Liu,et al.  SIN2: Stealth infection on neural network — A low-cost agile neural Trojan attack methodology , 2018, 2018 IEEE International Symposium on Hardware Oriented Security and Trust (HOST).

[3]  Wenbo Guo,et al.  TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems , 2019, ArXiv.

[4]  Sudipta Chattopadhyay,et al.  Exposing Backdoors in Robust Machine Learning Models , 2020, ArXiv.

[5]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[6]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[7]  Jorn-Henrik Jacobsen,et al.  Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations , 2020, ICML.

[8]  Wen-Chuan Lee,et al.  Trojaning Attack on Neural Networks , 2018, NDSS.

[9]  Alex Pentland,et al.  Face recognition using eigenfaces , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Graham Neubig,et al.  Weight Poisoning Attacks on Pretrained Models , 2020, ACL.

[11]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[12]  Kilian Q. Weinberger,et al.  TrojanNet: Embedding Hidden Trojan Horse Models in Neural Networks , 2020, ArXiv.

[13]  Ben Y. Zhao,et al.  Regula Sub-rosa: Latent Backdoor Attacks on Deep Neural Networks , 2019, ArXiv.

[14]  Vitaly Shmatikov,et al.  How To Backdoor Federated Learning , 2018, AISTATS.

[15]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[16]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Michael W. Mahoney,et al.  Noise-response Analysis for Rapid Detection of Backdoors in Deep Neural Networks , 2020, ArXiv.

[18]  Dan Boneh,et al.  The Space of Transferable Adversarial Examples , 2017, ArXiv.

[19]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[20]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[21]  Fan Yang,et al.  An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks , 2020, KDD.

[22]  Bao Gia Doan,et al.  Februus: Input Purification Defence Against Trojan Attacks on Deep Neural Network Systems , 2019, 1908.03369.

[23]  Ruian Duan,et al.  Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages , 2020, NDSS.

[24]  Xin Liu,et al.  DPATCH: An Adversarial Patch Attack on Object Detectors , 2018, SafeAI@AAAI.

[25]  Vladlen Koltun,et al.  Multi-Task Learning as Multi-Objective Optimization , 2018, NeurIPS.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yukun Yang,et al.  Defending Neural Backdoors via Generative Distribution Modeling , 2019, NeurIPS.

[28]  Xiangyu Zhang,et al.  Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples , 2018, NeurIPS.

[29]  H. Pirsiavash,et al.  Hidden Trigger Backdoor Attacks , 2019, AAAI.

[30]  Ashish Agarwal,et al.  TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning , 2019, SysML.

[31]  Aleksander Madry,et al.  Clean-Label Backdoor Attacks , 2018 .

[32]  Ben Y. Zhao,et al.  Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[33]  Timothy A. Mann,et al.  On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models , 2018, ArXiv.

[34]  Mark Lee,et al.  On Physical Adversarial Patches for Object Detection , 2019, ArXiv.

[35]  Tom Goldstein,et al.  Certified Defenses for Adversarial Patches , 2020, ICLR.

[36]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[37]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[38]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[39]  Yang Song,et al.  Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Tudor Dumitras,et al.  On the Effectiveness of Mitigating Data Poisoning Attacks with Gradient Shaping , 2020, ArXiv.

[41]  Konrad Rieck,et al.  Backdooring and Poisoning Neural Networks with Image-Scaling Attacks , 2020, 2020 IEEE Security and Privacy Workshops (SPW).

[42]  Reza Shokri,et al.  Bypassing Backdoor Detection Algorithms in Deep Learning , 2019, 2020 IEEE European Symposium on Security and Privacy (EuroS&P).

[43]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[44]  Brendan Dolan-Gavitt,et al.  BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[45]  Amar Phanishayee,et al.  Benchmarking and Analyzing Deep Neural Network Training , 2018, 2018 IEEE International Symposium on Workload Characterization (IISWC).

[46]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Justin Hsu,et al.  Data Poisoning against Differentially-Private Learners: Attacks and Defenses , 2019, IJCAI.

[48]  Benjamin Edwards,et al.  Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering , 2018, SafeAI@AAAI.

[49]  Damith Chinthana Ranasinghe,et al.  STRIP: a defence against trojan attacks on deep neural networks , 2019, ACSAC.

[50]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[51]  Jing Ye,et al.  Memory Trojan Attack on Neural Network Accelerators , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[52]  Yang Zhang,et al.  Dynamic Backdoor Attacks Against Machine Learning Models , 2020, 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P).

[53]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[54]  Jerry Li,et al.  Spectral Signatures in Backdoor Attacks , 2018, NeurIPS.

[55]  Yevgeniy Vorobeychik,et al.  The Tale of Evil Twins: Adversarial Inputs versus Backdoored Models , 2019, ArXiv.

[56]  Brendan Dolan-Gavitt,et al.  Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks , 2018, RAID.

[57]  Xiangyu Zhang,et al.  Composite Backdoor Attack for Deep Neural Network by Mixing Existing Benign Features , 2020, CCS.

[58]  Dawn Song,et al.  Robust Anomaly Detection and Backdoor Attack Detection Via Differential Privacy , 2019, ICLR.

[59]  Ankur Srivastava,et al.  Neural Trojans , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[60]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[61]  Shanshan Peng,et al.  Model Agnostic Defence Against Backdoor Attacks in Machine Learning , 2019, IEEE Transactions on Reliability.

[62]  Deliang Fan,et al.  TBT: Targeted Neural Network Attack With Bit Trojan , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Zhifeng Li,et al.  Rethinking the Trigger of Backdoor Attack , 2020, ArXiv.

[64]  Florian Tramèr,et al.  SentiNet: Detecting Localized Universal Attacks Against Deep Learning Systems , 2018, 2020 IEEE Security and Privacy Workshops (SPW).

[65]  Jeremy Howard,et al.  fastai: A Layered API for Deep Learning , 2020, Inf..

[66]  Aleksander Madry,et al.  How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.

[67]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[68]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.

[69]  Kenneth T. Co,et al.  Procedural Noise Adversarial Examples for Black-Box Attacks on Deep Convolutional Networks , 2018, CCS.

[70]  Sencun Zhu,et al.  Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation , 2018, CODASPY.

[71]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[72]  Dawn Xiaodong Song,et al.  Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[73]  Chengzhi Mao,et al.  Live Trojan Attacks on Deep Neural Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[74]  Mani Srivastava,et al.  NeuronInspect: Detecting Backdoors in Neural Networks via Output Explanations , 2019, ArXiv.

[75]  Xiapu Luo,et al.  A Tale of Evil Twins: Adversarial Inputs versus Poisoned Models , 2019, CCS.

[76]  Damith C. Ranasinghe,et al.  DeepCleanse: A Black-box Input SanitizationFramework Against Backdoor Attacks on DeepNeural Networks , 2019 .

[77]  Nikita Borisov,et al.  Detecting AI Trojans Using Meta Neural Analysis , 2019, 2021 IEEE Symposium on Security and Privacy (SP).

[78]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[79]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[80]  J. Désidéri Multiple-gradient descent algorithm (MGDA) for multiobjective optimization , 2012 .

[81]  Paul Barford,et al.  Data Poisoning Attacks against Autoregressive Models , 2016, AAAI.

[82]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[83]  Dawn Song,et al.  Physical Adversarial Examples for Object Detectors , 2018, WOOT @ USENIX Security Symposium.

[84]  Jascha Sohl-Dickstein,et al.  Adversarial Reprogramming of Neural Networks , 2018, ICLR.

[85]  Wen-Zhan Song,et al.  PoTrojan: powerful neural-level trojan designs in deep learning models , 2018, ArXiv.

[86]  Dawn Xiaodong Song,et al.  Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[87]  Qiang Yang,et al.  An Overview of Multi-task Learning , 2018 .

[88]  Ning Zhang,et al.  Beyond frontal faces: Improving Person Recognition using multiple cues , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[89]  Percy Liang,et al.  Certified Defenses for Data Poisoning Attacks , 2017, NIPS.

[90]  Dan Boneh,et al.  Adversarial Training and Robustness for Multiple Perturbations , 2019, NeurIPS.

[91]  Xiangyu Zhang,et al.  ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation , 2019, CCS.

[92]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[93]  Muhammad Shafique,et al.  TrISec: Training Data-Unaware Imperceptible Security Attacks on Deep Neural Networks , 2019, 2019 IEEE 25th International Symposium on On-Line Testing and Robust System Design (IOLTS).

[94]  Alina Oprea,et al.  Subpopulation Data Poisoning Attacks , 2020, CCS.

[95]  Martín Abadi,et al.  Adversarial Patch , 2017, ArXiv.

[96]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[97]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[98]  Chang Liu,et al.  Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[99]  Emden R. Gansner,et al.  Drawing graphs with dot , 2006 .

[100]  Ting Wang,et al.  Model-Reuse Attacks on Deep Learning Systems , 2018, CCS.

[101]  Ramesh Karri,et al.  NNoculation: Broad Spectrum and Targeted Treatment of Backdoored DNNs , 2020, ArXiv.

[102]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[103]  Dan Boneh,et al.  SentiNet: Detecting Physical Attacks Against Deep Learning Systems , 2018, ArXiv.

[104]  Haixu Tang,et al.  Demon in the Variant: Statistical Analysis of DNNs for Robust Backdoor Contamination Detection , 2019, USENIX Security Symposium.

[105]  Cho-Jui Hsieh,et al.  Towards Stable and Efficient Training of Verifiably Robust Neural Networks , 2019, ICLR.