论文信息 - Deep Partition Aggregation: Provable Defense against General Poisoning Attacks

Deep Partition Aggregation: Provable Defense against General Poisoning Attacks

Adversarial poisoning attacks distort training data in order to corrupt the test-time behavior of a classifier. A provable defense provides a certificate for each test sample, which is a lower bound on the magnitude of any adversarial distortion of the training set that can corrupt the test sample's classification. We propose two provable defenses against poisoning attacks: (i) Deep Partition Aggregation (DPA), a certified defense against a general poisoning threat model, defined as the insertion or deletion of a bounded number of samples to the training set -- by implication, this threat model also includes arbitrary distortions to a bounded number of images and/or labels; and (ii) Semi-Supervised DPA (SS-DPA), a certified defense against label-flipping poisoning attacks. DPA is an ensemble method where base models are trained on partitions of the training set determined by a hash function. DPA is related to subset aggregation, a well-studied ensemble method in classical machine learning. DPA can also be viewed as an extension of randomized ablation (Levine & Feizi, 2020a), a certified defense against sparse evasion attacks, to the poisoning domain. Our label-flipping defense, SS-DPA, uses a semi-supervised learning algorithm as its base classifier model: we train each base classifier using the entire unlabeled training set in addition to the labels for a partition. SS-DPA outperforms the existing certified defense for label-flipping attacks (Rosenfeld et al., 2020). SS-DPA certifies >= 50% of test images against 675 label flips (vs. = 50% of test images against > 500 poison image insertions on MNIST, and nine insertions on CIFAR-10. These results establish new state-of-the-art provable defenses against poison attacks.

Alexander Levine | Soheil Feizi | Alexander Levine | S. Feizi

[1] Avrim Blum,et al. Random Smoothing Might be Unable to Certify 𝓁∞ Robustness for High-Dimensional Images , 2020, J. Mach. Learn. Res..

[2] Tudor Dumitras,et al. Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks , 2018, NeurIPS.

[3] Tom Goldstein,et al. Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness , 2020, ICML.

[4] Yoshua Bengio,et al. Interpolation Consistency Training for Semi-Supervised Learning , 2019, IJCAI.

[5] Robert H. Sloan,et al. Four Types of Noise in Data for PAC Learning , 1995, Inf. Process. Lett..

[6] Fabio Roli,et al. Bagging Classifiers for Fighting Poisoning Attacks in Adversarial Classification Tasks , 2011, MCS.

[7] Xiaoyu Cao,et al. Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks , 2020, AAAI.

[8] Timothy A. Mann,et al. On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models , 2018, ArXiv.

[9] Johannes Stallkamp,et al. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition , 2012, Neural Networks.

[10] Lawrence Carin,et al. Second-Order Adversarial Attack and Certifiable Robustness , 2018, ArXiv.

[11] P. Bühlmann. Bagging, subagging and bragging for improving some prediction algorithms , 2003 .

[12] Eyal Kushilevitz,et al. PAC learning with nasty noise , 1999, Theor. Comput. Sci..

[13] Ilya P. Razenshteyn,et al. Randomized Smoothing of All Shapes and Sizes , 2020, ICML.

[14] Angelos Stavrou,et al. When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors , 2016, NDSS.

[15] Alexander Levine,et al. Robustness Certificates for Sparse Adversarial Attacks by Randomized Ablation , 2019, AAAI.

[16] Suman Jana,et al. Certified Robustness to Adversarial Examples with Differential Privacy , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[17] Daniel M. Kane,et al. Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[18] Ce Liu,et al. Supervised Contrastive Learning , 2020, NeurIPS.

[19] Santosh S. Vempala,et al. Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[20] Percy Liang,et al. Certified Defenses for Data Poisoning Attacks , 2017, NIPS.

[21] Tommi S. Jaakkola,et al. Tight Certificates of Adversarial Robustness for Randomly Smoothed Classifiers , 2019, NeurIPS.

[22] J. Zico Kolter,et al. Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[23] Ce Zhang,et al. RAB: Provable Robustness Against Backdoor Attacks , 2020, ArXiv.

[24] J. Z. Kolter,et al. Certified Robustness to Label-Flipping Attacks via Randomized Smoothing , 2020, ICML.

[25] Jerry Li,et al. Sever: A Robust Meta-Algorithm for Stochastic Optimization , 2018, ICML.

[26] Saeed Mahloujifar,et al. The Curse of Concentration in Robust Learning: Evasion and Poisoning Attacks from Concentration of Measure , 2018, AAAI.

[27] J. Zico Kolter,et al. Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[28] Blaine Nelson,et al. Poisoning Attacks against Support Vector Machines , 2012, ICML.

[29] Nikos Komodakis,et al. Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[30] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[31] Alexander Levine,et al. (De)Randomized Smoothing for Certifiable Defense against Patch Attacks , 2020, NeurIPS.