Excess Capacity and Backdoor Poisoning

A backdoor data poisoning attack is an adversarial attack wherein the attacker injects several watermarked, mislabeled training examples into a training set. The watermark does not impact the test-time performance of the model on typical data; however, the model reliably errs on watermarked examples. To gain a better foundational understanding of backdoor data poisoning attacks, we present a formal theoretical framework within which one can discuss backdoor data poisoning attacks for classification problems. We then use this to analyze important statistical and computational issues surrounding these attacks. On the statistical front, we identify a parameter we call the memorization capacity that captures the intrinsic vulnerability of a learning problem to a backdoor attack. This allows us to argue about the robustness of several natural learning problems to backdoor attacks. Our results favoring the attacker involve presenting explicit constructions of backdoor attacks, and our robustness results show that some natural problem settings cannot yield successful backdoor attacks. From a computational standpoint, we show that under certain assumptions, adversarial training can detect the presence of backdoors in a training set. We then show that under similar assumptions, two closely related problems we call backdoor filtering and robust generalization are nearly equivalent. This implies that it is both asymptotically necessary and sufficient to design algorithms that can identify watermarked examples in the training set in order to obtain a learning algorithm that both generalizes well to unseen data and is robust to backdoors.

[1]  Brenda Praggastis,et al.  Systematic Evaluation of Backdoor Data Poisoning Attacks on Image Classifiers , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2]  Hamed Pirsiavash,et al.  Hidden Trigger Backdoor Attacks , 2019, AAAI.

[3]  Tudor Dumitras,et al.  Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks , 2018, NeurIPS.

[4]  Aleksander Madry,et al.  Noise or Signal: The Role of Image Backgrounds in Object Recognition , 2020, ICLR.

[5]  Jerry Li,et al.  Spectral Signatures in Backdoor Attacks , 2018, NeurIPS.

[6]  Baoyuan Wu,et al.  Backdoor Learning: A Survey , 2020, ArXiv.

[7]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[8]  Nathan Srebro,et al.  Exploring Generalization in Deep Learning , 2017, NIPS.

[9]  Ben Y. Zhao,et al.  Fawkes: Protecting Privacy against Unauthorized Deep Learning Models , 2020, USENIX Security Symposium.

[10]  Siddharth Garg,et al.  BadNets: Evaluating Backdooring Attacks on Deep Neural Networks , 2019, IEEE Access.

[11]  Percy Liang,et al.  Stronger data poisoning attacks break data sanitization defenses , 2018, Machine Learning.

[12]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[13]  Jonas Geiping,et al.  Strong Data Augmentation Sanitizes Poisoning and Backdoor Attacks Without an Accuracy Tradeoff , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Ankur Moitra,et al.  Algorithms and Hardness for Robust Subspace Recovery , 2012, COLT.

[15]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[16]  Amit Daniely,et al.  Multiclass Learning Approaches: A Theoretical Comparison with Implications , 2012, NIPS.

[17]  Dawn Xiaodong Song,et al.  Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[18]  Benny Pinkas,et al.  Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring , 2018, USENIX Security Symposium.

[19]  Ilias Diakonikolas,et al.  Efficiently Learning Adversarially Robust Halfspaces with Noise , 2020, ICML.

[20]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[21]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[22]  Prateek Mittal,et al.  PAC-learning in the presence of adversaries , 2018, NeurIPS.

[23]  Aleksander Madry,et al.  Label-Consistent Backdoor Attacks , 2019, ArXiv.

[24]  Brendan Dolan-Gavitt,et al.  BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[25]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[26]  Kartik Sreenivasan,et al.  Attack of the Tails: Yes, You Really Can Backdoor Federated Learning , 2020, NeurIPS.

[27]  Nathan Srebro,et al.  VC Classes are Adversarially Robustly Learnable, but Only Improperly , 2019, COLT.

[28]  Yanyao Shen,et al.  Learning with Bad Training Data via Iterative Trimmed Loss Minimization , 2018, ICML.

[29]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[30]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[31]  Vitaly Feldman,et al.  What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation , 2020, NeurIPS.