Deep Partition Aggregation: Provable Defense against General Poisoning Attacks

Adversarial poisoning attacks distort training data in order to corrupt the test-time behavior of a classifier. A provable defense provides a certificate for each test sample, which is a lower bound on the magnitude of any adversarial distortion of the training set that can corrupt the test sample's classification. We propose two provable defenses against poisoning attacks: (i) Deep Partition Aggregation (DPA), a certified defense against a general poisoning threat model, defined as the insertion or deletion of a bounded number of samples to the training set -- by implication, this threat model also includes arbitrary distortions to a bounded number of images and/or labels; and (ii) Semi-Supervised DPA (SS-DPA), a certified defense against label-flipping poisoning attacks. DPA is an ensemble method where base models are trained on partitions of the training set determined by a hash function. DPA is related to subset aggregation, a well-studied ensemble method in classical machine learning. DPA can also be viewed as an extension of randomized ablation (Levine & Feizi, 2020a), a certified defense against sparse evasion attacks, to the poisoning domain. Our label-flipping defense, SS-DPA, uses a semi-supervised learning algorithm as its base classifier model: we train each base classifier using the entire unlabeled training set in addition to the labels for a partition. SS-DPA outperforms the existing certified defense for label-flipping attacks (Rosenfeld et al., 2020). SS-DPA certifies >= 50% of test images against 675 label flips (vs. = 50% of test images against > 500 poison image insertions on MNIST, and nine insertions on CIFAR-10. These results establish new state-of-the-art provable defenses against poison attacks.

[1]  Avrim Blum,et al.  Random Smoothing Might be Unable to Certify 𝓁∞ Robustness for High-Dimensional Images , 2020, J. Mach. Learn. Res..

[2]  Tudor Dumitras,et al.  Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks , 2018, NeurIPS.

[3]  Tom Goldstein,et al.  Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness , 2020, ICML.

[4]  Yoshua Bengio,et al.  Interpolation Consistency Training for Semi-Supervised Learning , 2019, IJCAI.

[5]  Robert H. Sloan,et al.  Four Types of Noise in Data for PAC Learning , 1995, Inf. Process. Lett..

[6]  Fabio Roli,et al.  Bagging Classifiers for Fighting Poisoning Attacks in Adversarial Classification Tasks , 2011, MCS.

[7]  Xiaoyu Cao,et al.  Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks , 2020, AAAI.

[8]  Timothy A. Mann,et al.  On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models , 2018, ArXiv.

[9]  Johannes Stallkamp,et al.  Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition , 2012, Neural Networks.

[10]  Lawrence Carin,et al.  Second-Order Adversarial Attack and Certifiable Robustness , 2018, ArXiv.

[11]  P. Bühlmann Bagging, subagging and bragging for improving some prediction algorithms , 2003 .

[12]  Eyal Kushilevitz,et al.  PAC learning with nasty noise , 1999, Theor. Comput. Sci..

[13]  Ilya P. Razenshteyn,et al.  Randomized Smoothing of All Shapes and Sizes , 2020, ICML.

[14]  Angelos Stavrou,et al.  When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors , 2016, NDSS.

[15]  Alexander Levine,et al.  Robustness Certificates for Sparse Adversarial Attacks by Randomized Ablation , 2019, AAAI.

[16]  Suman Jana,et al.  Certified Robustness to Adversarial Examples with Differential Privacy , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[17]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[18]  Ce Liu,et al.  Supervised Contrastive Learning , 2020, NeurIPS.

[19]  Santosh S. Vempala,et al.  Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[20]  Percy Liang,et al.  Certified Defenses for Data Poisoning Attacks , 2017, NIPS.

[21]  Tommi S. Jaakkola,et al.  Tight Certificates of Adversarial Robustness for Randomly Smoothed Classifiers , 2019, NeurIPS.

[22]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[23]  Ce Zhang,et al.  RAB: Provable Robustness Against Backdoor Attacks , 2020, ArXiv.

[24]  J. Z. Kolter,et al.  Certified Robustness to Label-Flipping Attacks via Randomized Smoothing , 2020, ICML.

[25]  Jerry Li,et al.  Sever: A Robust Meta-Algorithm for Stochastic Optimization , 2018, ICML.

[26]  Saeed Mahloujifar,et al.  The Curse of Concentration in Robust Learning: Evasion and Poisoning Attacks from Concentration of Measure , 2018, AAAI.

[27]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[28]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[29]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[30]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[31]  Alexander Levine,et al.  (De)Randomized Smoothing for Certifiable Defense against Patch Attacks , 2020, NeurIPS.

[32]  A. Buja,et al.  OBSERVATIONS ON BAGGING , 2006 .

[33]  Faisal Zaman,et al.  Effect of Subsampling Rate on Subbagging and Related Ensembles of Stable Classifiers , 2009, PReMI.

[34]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[35]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[36]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[37]  Claudia Eckert,et al.  Adversarial Label Flips Attack on Support Vector Machines , 2012, ECAI.

[38]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[39]  Dawn Xiaodong Song,et al.  Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[40]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[41]  Bo Zhang,et al.  Smooth Neighbors on Teacher Graphs for Semi-Supervised Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Soheil Feizi,et al.  Wasserstein Smoothing: Certified Robustness against Wasserstein Adversarial Attacks , 2019, AISTATS.

[43]  Avrim Blum,et al.  Random Smoothing Might be Unable to Certify 𝓁∞ Robustness for High-Dimensional Images , 2020, J. Mach. Learn. Res..

[44]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[45]  Shun-ichi Amari,et al.  Four Types of Learning Curves , 1992, Neural Computation.

[46]  Greg Yang,et al.  Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers , 2019, NeurIPS.