Anti-Backdoor Learning: Training Clean Models on Poisoned Data

Backdoor attack has emerged as a major security threat to deep neural networks (DNNs). While existing defense methods have demonstrated promising results on detecting or erasing backdoors, it is still not clear whether robust training methods can be devised to prevent the backdoor triggers being injected into the trained model in the first place. In this paper, we introduce the concept of anti-backdoor learning, aiming to train clean models given backdoor-poisoned data. We frame the overall learning process as a dual-task of learning the clean and the backdoor portions of data. From this view, we identify two inherent characteristics of backdoor attacks as their weaknesses: 1) the models learn backdoored data much faster than learning with clean data, and the stronger the attack the faster the model converges on backdoored data; 2) the backdoor task is tied to a specific class (the backdoor target class). Based on these two weaknesses, we propose a general learning scheme, Anti-Backdoor Learning (ABL), to automatically prevent backdoor attacks during training. ABL introduces a two-stage gradient ascent mechanism for standard training to 1) help isolate backdoor examples at an early training stage, and 2) break the correlation between backdoor examples and the target class at a later training stage. Through extensive experiments on multiple benchmark datasets against 10 state-of-the-art attacks, we empirically show that ABL-trained models on backdoor-poisoned data achieve the same performance as they were trained on purely clean data. Code is available at https://github.com/bboylyg/ABL.

[1]  Benjamin Edwards,et al.  Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering , 2018, SafeAI@AAAI.

[2]  James Bailey,et al.  What Do Deep Nets Learn? Class-wise Patterns Revealed in the Input Space , 2021, ArXiv.

[3]  Bo Li,et al.  DBA: Distributed Backdoor Attacks against Federated Learning , 2020, ICLR.

[4]  Karthikeyan Natesan Ramamurthy,et al.  Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness , 2020, ICLR.

[5]  Xiangyu Zhang,et al.  Backdoor Scanning for Deep Neural Networks through K-Arm Optimization , 2021, ICML.

[6]  Damith Chinthana Ranasinghe,et al.  STRIP: a defence against trojan attacks on deep neural networks , 2019, ACSAC.

[7]  Siwei Lyu,et al.  Invisible Backdoor Attack with Sample-Specific Triggers , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[9]  Brendan Dolan-Gavitt,et al.  Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks , 2018, RAID.

[10]  Johannes Stallkamp,et al.  Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition , 2012, Neural Networks.

[11]  Mauro Barni,et al.  A New Backdoor Attack in CNNS by Training Set Corruption Without Label Poisoning , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[12]  Heiko Hoffmann,et al.  Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Minhui Xue,et al.  Invisible Backdoor Attacks on Deep Neural Networks Via Steganography and Regularization , 2019, IEEE Transactions on Dependable and Secure Computing.

[14]  Haixu Tang,et al.  Demon in the Variant: Statistical Analysis of DNNs for Robust Backdoor Contamination Detection , 2019, USENIX Security Symposium.

[15]  Xiangyu Zhang,et al.  Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification , 2020, AAAI.

[16]  Lingjuan Lyu,et al.  Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks , 2021, ICLR.

[17]  Xiangyu Zhang,et al.  Composite Backdoor Attack for Deep Neural Network by Mixing Existing Benign Features , 2020, CCS.

[18]  Shouling Ji,et al.  Invisible Poisoning: Highly Stealthy Targeted Poisoning Attack , 2019, Inscrypt.

[19]  Dawn Xiaodong Song,et al.  Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[20]  Dongxian Wu,et al.  Adversarial Neuron Pruning Purifies Backdoored Deep Models , 2021, NeurIPS.

[21]  Tom Goldstein,et al.  Transferable Clean-Label Poisoning Attacks on Deep Neural Nets , 2019, ICML.

[22]  Ben Y. Zhao,et al.  Latent Backdoor Attacks on Deep Neural Networks , 2019, CCS.

[23]  Gang Niu,et al.  Do We Need Zero Training Loss After Achieving Zero Training Error? , 2020, ICML.

[24]  Cheng-Hsin Weng,et al.  On the Trade-off between Adversarial and Backdoor Robustness , 2020, NeurIPS.

[25]  Baoyuan Wu,et al.  Rethinking the Trigger of Backdoor Attack , 2020, ArXiv.

[26]  Brendan Dolan-Gavitt,et al.  BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[27]  Hamed Pirsiavash,et al.  Hidden Trigger Backdoor Attacks , 2019, AAAI.

[28]  Wen-Chuan Lee,et al.  Trojaning Attack on Neural Networks , 2018, NDSS.

[29]  James Bailey,et al.  Clean-Label Backdoor Attacks on Video Recognition Models , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Philip S. Yu,et al.  Privacy and Robustness in Federated Learning: Attacks and Defenses , 2020, IEEE transactions on neural networks and learning systems.

[31]  Nikita Borisov,et al.  Detecting AI Trojans Using Meta Neural Analysis , 2019, 2021 IEEE Symposium on Security and Privacy (SP).

[32]  Yunfei Liu,et al.  Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks , 2020, ECCV.

[33]  Anh Tran,et al.  Input-Aware Dynamic Backdoor Attack , 2020, NeurIPS.

[34]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[35]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Sencun Zhu,et al.  Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation , 2018, CODASPY.

[37]  Sewoong Oh,et al.  SPECTRE: Defending Against Backdoor Attacks Using Robust Statistics , 2021, ArXiv.

[38]  Jishen Zhao,et al.  DeepInspect: A Black-box Trojan Detection and Mitigation Framework for Deep Neural Networks , 2019, IJCAI.

[39]  Aleksander Madry,et al.  Clean-Label Backdoor Attacks , 2018 .

[40]  Ben Y. Zhao,et al.  Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[41]  Yanyao Shen,et al.  Learning with Bad Training Data via Iterative Trimmed Loss Minimization , 2018, ICML.

[42]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[43]  Jerry Li,et al.  Spectral Signatures in Backdoor Attacks , 2018, NeurIPS.

[44]  Baoyuan Wu,et al.  Backdoor Learning: A Survey , 2020, ArXiv.

[45]  Tudor Dumitras,et al.  Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks , 2018, NeurIPS.