Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks Trained from Scratch

As the curation of data for machine learning becomes increasingly automated, dataset tampering is a mounting threat. Backdoor attackers tamper with training data to embed a vulnerability in models that are trained on that data. This vulnerability is then activated at inference time by placing a “trigger” into the model’s input. Typical backdoor attacks insert the trigger directly into the training data, although the presence of such an attack may be visible upon inspection. In contrast, the Hidden Trigger Backdoor Attack achieves poisoning without placing a trigger into the training data at all. However, this hidden trigger attack is ineffective at poisoning neural networks trained from scratch. We develop a new hidden trigger attack, Sleeper Agent, which employs gradient matching, data selection, and target model re-training during the crafting process. Sleeper Agent is the first hidden trigger backdoor attack to be effective against neural networks trained from scratch. We demonstrate its effectiveness on ImageNet and in black-box settings.

[1]  Benjamin Edwards,et al.  Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering , 2018, SafeAI@AAAI.

[2]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[3]  Vitaly Shmatikov,et al.  How To Backdoor Federated Learning , 2018, AISTATS.

[4]  Hakan Bilen,et al.  Dataset Condensation with Gradient Matching , 2020, ICLR.

[5]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jonas Geiping,et al.  DP-InstaHide: Provably Defusing Poisoning and Backdoor Attacks with Differentially Private Data Augmentations , 2021, ArXiv.

[8]  Mauro Barni,et al.  A New Backdoor Attack in CNNS by Training Set Corruption Without Label Poisoning , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[9]  Hamed Pirsiavash,et al.  Hidden Trigger Backdoor Attacks , 2019, AAAI.

[10]  Wen-Chuan Lee,et al.  Trojaning Attack on Neural Networks , 2018, NDSS.

[11]  Minhui Xue,et al.  Invisible Backdoor Attacks on Deep Neural Networks Via Steganography and Regularization , 2019, IEEE Transactions on Dependable and Secure Computing.

[12]  Jonas Geiping,et al.  Adversarial Examples Make Strong Poisons , 2021, NeurIPS.

[13]  Xingjun Ma,et al.  Unlearnable Examples: Making Personal Data Unexploitable , 2021, ArXiv.

[14]  Micah Goldblum,et al.  Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks , 2020, ICML.

[15]  Ben Y. Zhao,et al.  Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[16]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[17]  Wojciech Czaja,et al.  Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching , 2021, ICLR.

[18]  A. Madry,et al.  Data Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses , 2020 .

[19]  Damith Chinthana Ranasinghe,et al.  STRIP: a defence against trojan attacks on deep neural networks , 2019, ACSAC.

[20]  Dawn Xiaodong Song,et al.  Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[21]  Jonas Geiping,et al.  MetaPoison: Practical General-purpose Clean-label Data Poisoning , 2020, NeurIPS.

[22]  Tom Goldstein,et al.  Transferable Clean-Label Poisoning Attacks on Deep Neural Nets , 2019, ICML.

[23]  Percy Liang,et al.  Certified Defenses for Data Poisoning Attacks , 2017, NIPS.

[24]  Shin'ichi Satoh,et al.  Embedding Watermarks into Deep Neural Networks , 2017, ICMR.

[25]  Tudor Dumitras,et al.  Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks , 2018, NeurIPS.

[26]  Jonas Geiping,et al.  Preventing Unauthorized Use of Proprietary Data: Poisoning for Secure Dataset Release , 2021, ArXiv.

[27]  Benjamin Zi Hao Zhao,et al.  Invisible Backdoor Attacks Against Deep Neural Networks , 2019, ArXiv.

[28]  Jerry Li,et al.  Spectral Signatures in Backdoor Attacks , 2018, NeurIPS.

[29]  Baoyuan Wu,et al.  Backdoor Learning: A Survey , 2020, ArXiv.

[30]  Juncheng Shen,et al.  TensorClog: An Imperceptible Poisoning Attack on Deep Neural Network Applications , 2019, IEEE Access.

[31]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[32]  Aleksander Madry,et al.  Label-Consistent Backdoor Attacks , 2019, ArXiv.

[33]  Brendan Dolan-Gavitt,et al.  BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[34]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Yunfei Liu,et al.  Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks , 2020, ECCV.

[36]  Fabio Roli,et al.  Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization , 2017, AISec@CCS.

[37]  Zhi-Hua Zhou,et al.  Learning to Confuse: Generating Training Time Adversarial Data with Auto-Encoder , 2019, NeurIPS.

[38]  Tudor Dumitras,et al.  On the Effectiveness of Mitigating Data Poisoning Attacks with Gradient Shaping , 2020, ArXiv.

[39]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.