SPECTRE: Defending Against Backdoor Attacks Using Robust Statistics

Modern machine learning increasingly requires training on a large collection of data from multiple sources, not all of which can be trusted. A particularly concerning scenario is when a small fraction of poisoned data changes the behavior of the trained model when triggered by an attackerspecified watermark. Such a compromised model will be deployed unnoticed as the model is accurate otherwise. There have been promising attempts to use the intermediate representations of such a model to separate corrupted examples from clean ones. However, these defenses work only when a certain spectral signature of the poisoned examples is large enough for detection. There is a wide range of attacks that cannot be protected against by the existing defenses. We propose a novel defense algorithm using robust covariance estimation to amplify the spectral signature of corrupted data. This defense provides a clean model, completely removing the backdoor, even in regimes where previous methods have no hope of detecting the poisoned examples.2

[1]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[2]  Percy Liang,et al.  Certified Defenses for Data Poisoning Attacks , 2017, NIPS.

[3]  Hamed Pirsiavash,et al.  Hidden Trigger Backdoor Attacks , 2019, AAAI.

[4]  Santosh S. Vempala,et al.  Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[5]  Damith Chinthana Ranasinghe,et al.  STRIP: a defence against trojan attacks on deep neural networks , 2019, ACSAC.

[6]  Samuel B. Hopkins,et al.  Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection , 2019, NeurIPS.

[7]  Fabio Roli,et al.  Poisoning behavioral malware clustering , 2014, AISec '14.

[8]  Reza Shokri,et al.  Bypassing Backdoor Detection Algorithms in Deep Learning , 2019, 2020 IEEE European Symposium on Security and Privacy (EuroS&P).

[9]  Sencun Zhu,et al.  Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation , 2018, CODASPY.

[10]  Dan Alistarh,et al.  Byzantine Stochastic Gradient Descent , 2018, NeurIPS.

[11]  Ling Huang,et al.  ANTIDOTE: understanding and defending against poisoning of anomaly detectors , 2009, IMC '09.

[12]  Daniel M. Kane,et al.  Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[13]  Benjamin Edwards,et al.  Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering , 2018, SafeAI@AAAI.

[14]  Kevin Tian,et al.  Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing , 2020, NeurIPS.

[15]  Yunfei Liu,et al.  Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks , 2020, ECCV.

[16]  Zaïd Harchaoui,et al.  Robust Aggregation for Federated Learning , 2019, IEEE Transactions on Signal Processing.

[17]  Chao Gao,et al.  Robust covariance and scatter matrix estimation under Huber’s contamination model , 2015, The Annals of Statistics.

[18]  Daniel Kressner,et al.  Recompression of Hadamard Products of Tensors in Tucker Format , 2017, SIAM J. Sci. Comput..

[19]  James Bailey,et al.  Clean-Label Backdoor Attacks on Video Recognition Models , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Bo Li,et al.  DBA: Distributed Backdoor Attacks against Federated Learning , 2020, ICLR.

[21]  A. Rawat,et al.  Adversarial robustness via robust low rank representations , 2020, NeurIPS.

[22]  Brendan Dolan-Gavitt,et al.  Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks , 2018, RAID.

[23]  Konrad Rieck,et al.  Backdooring and Poisoning Neural Networks with Image-Scaling Attacks , 2020, 2020 IEEE Security and Privacy Workshops (SPW).

[24]  Ce Zhang,et al.  RAB: Provable Robustness Against Backdoor Attacks , 2020, ArXiv.

[25]  Minhui Xue,et al.  Invisible Backdoor Attacks on Deep Neural Networks Via Steganography and Regularization , 2019, IEEE Transactions on Dependable and Secure Computing.

[26]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[27]  David P. Woodruff,et al.  Faster Algorithms for High-Dimensional Robust Covariance Estimation , 2019, COLT.

[28]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[29]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[30]  Dan Boneh,et al.  SentiNet: Detecting Physical Attacks Against Deep Learning Systems , 2018, ArXiv.

[31]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[32]  Ananda Theertha Suresh,et al.  Can You Really Backdoor Federated Learning? , 2019, ArXiv.

[33]  Jerry Li,et al.  Spectral Signatures in Backdoor Attacks , 2018, NeurIPS.

[34]  Aleksander Madry,et al.  Label-Consistent Backdoor Attacks , 2019, ArXiv.

[35]  Brendan Dolan-Gavitt,et al.  BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[36]  Susmita Sur-Kolay,et al.  Systematic Poisoning Attacks on and Defenses for Machine Learning in Healthcare , 2015, IEEE Journal of Biomedical and Health Informatics.

[37]  Rachid Guerraoui,et al.  Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent , 2017, NIPS.

[38]  Binghui Wang,et al.  On Certifying Robustness against Backdoor Attacks via Randomized Smoothing , 2020, ArXiv.

[39]  Ben Y. Zhao,et al.  Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[40]  James Newsome,et al.  Paragraph: Thwarting Signature Learning by Training Maliciously , 2006, RAID.

[41]  Weihao Kong,et al.  Robust and Differentially Private Mean Estimation , 2021, NeurIPS.

[42]  Dawn Xiaodong Song,et al.  Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[43]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[44]  Claudia Eckert,et al.  Is Feature Selection Secure against Training Data Poisoning? , 2015, ICML.

[45]  Kartik Sreenivasan,et al.  Attack of the Tails: Yes, You Really Can Backdoor Federated Learning , 2020, NeurIPS.

[46]  Yiran Chen,et al.  Generative Poisoning Attack Method Against Neural Networks , 2017, ArXiv.

[47]  Dimitris S. Papailiopoulos,et al.  DRACO: Byzantine-resilient Distributed Training via Redundant Gradients , 2018, ICML.

[48]  Weihao Kong,et al.  Robust Meta-learning for Mixed Linear Regression with Small Batches , 2020, NeurIPS.

[49]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[50]  Heiko Hoffmann,et al.  Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Jerry Li,et al.  Being Robust (in High Dimensions) Can Be Practical , 2017, ICML.

[52]  Mauro Barni,et al.  A New Backdoor Attack in CNNS by Training Set Corruption Without Label Poisoning , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[53]  Jerry Li,et al.  Robust Gaussian Covariance Estimation in Nearly-Matrix Multiplication Time , 2020, NeurIPS.

[54]  Pavel Laskov,et al.  Practical Evasion of a Learning-Based Classifier: A Case Study , 2014, 2014 IEEE Symposium on Security and Privacy.