Benchmarking the Effect of Poisoning Defenses on the Security and Bias of the Final Model

Machine learning models are susceptible to a class of attacks known as adversarial poisoning where an adversary can maliciously manipulate training data to hinder model performance or, more concerningly, insert backdoors to exploit at inference time. Many methods have been proposed to defend against adversarial poisoning by either identifying the poisoned samples to facilitate removal or developing poison agnostic training algorithms. Although effective, these proposed approaches can have unintended consequences on other aspects of model performance, such as worsening performance on certain data sub-populations, thus inducing a classification bias. In this work, we evaluate several adversarial poisoning defenses. In addition to traditional security metrics, i.e., robustness to poisoned samples, we propose a new metric to measure the potential undesirable discrimination of sub-populations resulting from using these defenses. Our investigation highlights that many of the evaluated defenses trade decision fairness to achieve higher adversarial poisoning robustness. Given these results, we recommend our proposed metric to be part of standard evaluations of machine learning defenses.

[1]  Alexander Levine,et al.  Improved Certified Defenses against Data Poisoning with (Deterministic) Finite Aggregation , 2022, ICML.

[2]  Qiang Liu,et al.  MaxUp: Lightweight Adversarial Training with Data Augmentation Improves Neural Network Training , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Giorgio A. Ascoli,et al.  BEAN: Interpretable and Efficient Learning With Biologically-Enhanced Artificial Neuronal Assembly Regularization , 2021, Frontiers in Neurorobotics.

[4]  T. Goldstein,et al.  What Doesn't Kill You Makes You Robust(er): How to Adversarially Train against Data Poisoning , 2021, 2102.13624.

[5]  Jonas Geiping,et al.  Strong Data Augmentation Sanitizes Poisoning and Backdoor Attacks Without an Accuracy Tradeoff , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Gavin Taylor,et al.  Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching , 2020, ICLR.

[7]  Alexander Levine,et al.  Deep Partition Aggregation: Provable Defense against General Poisoning Attacks , 2020, ICLR.

[8]  John P. Dickerson,et al.  Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks , 2020, ICML.

[9]  Yunfeng Zhang,et al.  AI Fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias , 2019, IBM Journal of Research and Development.

[10]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Ben Y. Zhao,et al.  Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[12]  Benjamin Edwards,et al.  Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering , 2018, SafeAI@AAAI.

[13]  Jerry Li,et al.  Spectral Signatures in Backdoor Attacks , 2018, NeurIPS.

[14]  Michael P. Wellman,et al.  SoK: Security and Privacy in Machine Learning , 2018, 2018 IEEE European Symposium on Security and Privacy (EuroS&P).

[15]  Alexander Wong,et al.  MicronNet: A Highly Compact Deep Convolutional Neural Network Architecture for Real-Time Embedded Traffic Sign Classification , 2018, IEEE Access.

[16]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[17]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[18]  Wen-Chuan Lee,et al.  Trojaning Attack on Neural Networks , 2018, NDSS.

[19]  Brendan Dolan-Gavitt,et al.  BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[20]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Johannes Stallkamp,et al.  Detection of traffic signs in real-world images: The German traffic sign detection benchmark , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[23]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[24]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .