论文信息 - SPECTRE: Defending Against Backdoor Attacks Using Robust Statistics

SPECTRE: Defending Against Backdoor Attacks Using Robust Statistics

Modern machine learning increasingly requires training on a large collection of data from multiple sources, not all of which can be trusted. A particularly concerning scenario is when a small fraction of poisoned data changes the behavior of the trained model when triggered by an attackerspecified watermark. Such a compromised model will be deployed unnoticed as the model is accurate otherwise. There have been promising attempts to use the intermediate representations of such a model to separate corrupted examples from clean ones. However, these defenses work only when a certain spectral signature of the poisoned examples is large enough for detection. There is a wide range of attacks that cannot be protected against by the existing defenses. We propose a novel defense algorithm using robust covariance estimation to amplify the spectral signature of corrupted data. This defense provides a clean model, completely removing the backdoor, even in regimes where previous methods have no hope of detecting the poisoned examples.2

[1] Ming Li,et al. Learning in the presence of malicious errors , 1993, STOC '88.

[2] Percy Liang,et al. Certified Defenses for Data Poisoning Attacks , 2017, NIPS.

[3] Hamed Pirsiavash,et al. Hidden Trigger Backdoor Attacks , 2019, AAAI.

[4] Santosh S. Vempala,et al. Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[5] Damith Chinthana Ranasinghe,et al. STRIP: a defence against trojan attacks on deep neural networks , 2019, ACSAC.

[6] Samuel B. Hopkins,et al. Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection , 2019, NeurIPS.

[7] Fabio Roli,et al. Poisoning behavioral malware clustering , 2014, AISec '14.

[8] Reza Shokri,et al. Bypassing Backdoor Detection Algorithms in Deep Learning , 2019, 2020 IEEE European Symposium on Security and Privacy (EuroS&P).

[9] Sencun Zhu,et al. Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation , 2018, CODASPY.

[10] Dan Alistarh,et al. Byzantine Stochastic Gradient Descent , 2018, NeurIPS.

[11] Ling Huang,et al. ANTIDOTE: understanding and defending against poisoning of anomaly detectors , 2009, IMC '09.

[12] Daniel M. Kane,et al. Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[13] Benjamin Edwards,et al. Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering , 2018, SafeAI@AAAI.

[14] Kevin Tian,et al. Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing , 2020, NeurIPS.

[15] Yunfei Liu,et al. Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks , 2020, ECCV.

[16] Zaïd Harchaoui,et al. Robust Aggregation for Federated Learning , 2019, IEEE Transactions on Signal Processing.

[17] Chao Gao,et al. Robust covariance and scatter matrix estimation under Huber’s contamination model , 2015, The Annals of Statistics.

[18] Daniel Kressner,et al. Recompression of Hadamard Products of Tensors in Tucker Format , 2017, SIAM J. Sci. Comput..

[19] James Bailey,et al. Clean-Label Backdoor Attacks on Video Recognition Models , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Bo Li,et al. DBA: Distributed Backdoor Attacks against Federated Learning , 2020, ICLR.

[21] A. Rawat,et al. Adversarial robustness via robust low rank representations , 2020, NeurIPS.

[22] Brendan Dolan-Gavitt,et al. Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks , 2018, RAID.

[23] Konrad Rieck,et al. Backdooring and Poisoning Neural Networks with Image-Scaling Attacks , 2020, 2020 IEEE Security and Privacy Workshops (SPW).

[24] Ce Zhang,et al. RAB: Provable Robustness Against Backdoor Attacks , 2020, ArXiv.

[25] Minhui Xue,et al. Invisible Backdoor Attacks on Deep Neural Networks Via Steganography and Regularization , 2019, IEEE Transactions on Dependable and Secure Computing.

[26] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[27] David P. Woodruff,et al. Faster Algorithms for High-Dimensional Robust Covariance Estimation , 2019, COLT.

[28] Richard Nock,et al. Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[29] Kibok Lee,et al. A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[30] Dan Boneh,et al. SentiNet: Detecting Physical Attacks Against Deep Learning Systems , 2018, ArXiv.

[31] R. Srikant,et al. Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[32] Ananda Theertha Suresh,et al. Can You Really Backdoor Federated Learning? , 2019, ArXiv.

[33] Jerry Li,et al. Spectral Signatures in Backdoor Attacks , 2018, NeurIPS.

[34] Aleksander Madry,et al. Label-Consistent Backdoor Attacks , 2019, ArXiv.

[35] Brendan Dolan-Gavitt,et al. BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[36] Susmita Sur-Kolay,et al. Systematic Poisoning Attacks on and Defenses for Machine Learning in Healthcare , 2015, IEEE Journal of Biomedical and Health Informatics.

[37] Rachid Guerraoui,et al. Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent , 2017, NIPS.

[38] Binghui Wang,et al. On Certifying Robustness against Backdoor Attacks via Randomized Smoothing , 2020, ArXiv.

[39] Ben Y. Zhao,et al. Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[40] James Newsome,et al. Paragraph: Thwarting Signature Learning by Training Maliciously , 2006, RAID.

[41] Weihao Kong,et al. Robust and Differentially Private Mean Estimation , 2021, NeurIPS.

[42] Dawn Xiaodong Song,et al. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[43] Aleksander Madry,et al. Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[44] Claudia Eckert,et al. Is Feature Selection Secure against Training Data Poisoning? , 2015, ICML.

[45] Kartik Sreenivasan,et al. Attack of the Tails: Yes, You Really Can Backdoor Federated Learning , 2020, NeurIPS.

[46] Yiran Chen,et al. Generative Poisoning Attack Method Against Neural Networks , 2017, ArXiv.

[47] Dimitris S. Papailiopoulos,et al. DRACO: Byzantine-resilient Distributed Training via Redundant Gradients , 2018, ICML.

[48] Weihao Kong,et al. Robust Meta-learning for Mixed Linear Regression with Small Batches , 2020, NeurIPS.

[49] Daniel M. Kane,et al. Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[50] Heiko Hoffmann,et al. Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51] Jerry Li,et al. Being Robust (in High Dimensions) Can Be Practical , 2017, ICML.

[52] Mauro Barni,et al. A New Backdoor Attack in CNNS by Training Set Corruption Without Label Poisoning , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[53] Jerry Li,et al. Robust Gaussian Covariance Estimation in Nearly-Matrix Multiplication Time , 2020, NeurIPS.

[54] Pavel Laskov,et al. Practical Evasion of a Learning-Based Classifier: A Case Study , 2014, 2014 IEEE Symposium on Security and Privacy.