Indiscriminate Poisoning Attacks Are Shortcuts

Indiscriminate data poisoning attacks, which add imperceptible perturbations to training data to maximize the test error of trained models, have become a trendy topic because they are thought to be capable of preventing unauthorized use of data. In this work, we investigate why these perturbations work in principle. We find that the perturbations of advanced poisoning attacks are almost linear separable when assigned with the target labels of the corresponding samples, which hence can work as shortcuts for the learning objective. This important population property has not been unveiled before. Moreover, we further verify that linear separability is indeed the workhorse for poisoning attacks. We synthesize linear separable data as perturbations and show that such synthetic perturbations are as powerful as the deliberately crafted attacks. Our finding suggests that the shortcut learning problem is more serious than previously believed as deep learning heavily relies on shortcuts even if they are of an imperceptible scale and mixed together with the normal features. This finding also suggests that pre-trained feature extractors would disable these poisoning attacks effectively.

[1]  Hamed Pirsiavash,et al.  Hidden Trigger Backdoor Attacks , 2019, AAAI.

[2]  Jonas Geiping,et al.  Adversarial Examples Make Strong Poisons , 2021, NeurIPS.

[3]  Micah Goldblum,et al.  Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks , 2020, ICML.

[4]  Pietro Perona,et al.  Recognition in Terra Incognita , 2018, ECCV.

[5]  Percy Liang,et al.  Stronger data poisoning attacks break data sanitization defenses , 2018, Machine Learning.

[6]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[7]  Wojciech Czaja,et al.  Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching , 2021, ICLR.

[8]  Xingjun Ma,et al.  Unlearnable Examples: Making Personal Data Unexploitable , 2021, ArXiv.

[9]  Zhi-Hua Zhou,et al.  Learning to Confuse: Generating Training Time Adversarial Data with Auto-Encoder , 2019, NeurIPS.

[10]  Andrew K. Lampinen,et al.  What shapes feature representations? Exploring datasets, architectures, and training , 2020, NeurIPS.

[11]  Dawn Xiaodong Song,et al.  Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Blaine Nelson,et al.  The security of machine learning , 2010, Machine Learning.

[14]  Jonas Geiping,et al.  MetaPoison: Practical General-purpose Clean-label Data Poisoning , 2020, NeurIPS.

[15]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[16]  Tom Goldstein,et al.  Transferable Clean-Label Poisoning Attacks on Deep Neural Nets , 2019, ICML.

[17]  M. Bethge,et al.  Shortcut learning in deep neural networks , 2020, Nature Machine Intelligence.

[18]  Chia-Hung Yuan,et al.  Neural Tangent Generalization Attacks , 2021, ICML.

[19]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[20]  Vitaly Shmatikov,et al.  How To Backdoor Federated Learning , 2018, AISTATS.

[21]  Anh Tran,et al.  Input-Aware Dynamic Backdoor Attack , 2020, NeurIPS.

[22]  Micah Goldblum,et al.  Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses , 2020, ArXiv.

[23]  Fabio Roli,et al.  Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization , 2017, AISec@CCS.

[24]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[25]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[26]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[27]  Jinfeng Yi,et al.  Provable Defense Against Delusive Poisoning , 2021, ArXiv.

[28]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[29]  Prateek Jain,et al.  The Pitfalls of Simplicity Bias in Neural Networks , 2020, NeurIPS.

[30]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[31]  Micah Goldblum,et al.  LowKey: Leveraging Adversarial Attacks to Protect Social Media Users from Facial Recognition , 2021, ICLR.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Juncheng Shen,et al.  TensorClog: An Imperceptible Poisoning Attack on Deep Neural Network Applications , 2019, IEEE Access.

[34]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[35]  Anh Nguyen,et al.  WaNet - Imperceptible Warping-based Backdoor Attack , 2021, ICLR.

[36]  Aleksander Madry,et al.  Clean-Label Backdoor Attacks , 2018 .

[37]  Ben Y. Zhao,et al.  Fawkes: Protecting Privacy against Unauthorized Deep Learning Models , 2020, USENIX Security Symposium.

[38]  Bo Li,et al.  DBA: Distributed Backdoor Attacks against Federated Learning , 2020, ICLR.

[39]  Isabelle Guyon,et al.  Design of experiments for the NIPS 2003 variable selection benchmark , 2003 .

[40]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[41]  俊一 甘利 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .

[42]  Geoffrey E. Hinton,et al.  Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.

[43]  Florian Tramer,et al.  Data Poisoning Won't Save You From Facial Recognition , 2021, ArXiv.

[44]  Ness Shroff,et al.  Data Poisoning Attacks on Stochastic Bandits , 2019, ICML.

[45]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[46]  Hung-Yu Kao,et al.  Probing Neural Network Comprehension of Natural Language Arguments , 2019, ACL.

[47]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Tudor Dumitras,et al.  Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks , 2018, NeurIPS.

[49]  Jonas Geiping,et al.  Preventing Unauthorized Use of Proprietary Data: Poisoning for Secure Dataset Release , 2021, ArXiv.

[50]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Hossein Mobahi,et al.  The Low-Rank Simplicity Bias in Deep Networks , 2021, ArXiv.

[52]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[53]  Colin Raffel,et al.  Extracting Training Data from Large Language Models , 2020, USENIX Security Symposium.

[54]  Vinay Uday Prabhu,et al.  Large image datasets: A pyrrhic win for computer vision? , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).