论文信息 - Poisoning and Backdooring Contrastive Learning

Poisoning and Backdooring Contrastive Learning

Multimodal contrastive learning methods like CLIP train on noisy and uncurated training datasets. This is cheaper than labeling datasets manually, and even improves out-of-distribution robustness. We show that this practice makes backdoor and poisoning attacks a signiﬁcant threat. By poisoning just 0 . 01% of a dataset (e.g., just 300 images of the 3 million-example Conceptual Captions dataset), we can cause the model to misclassify test images by overlaying a small patch. Targeted poisoning attacks, whereby the model misclassiﬁes a particular test input with an adversarially-desired label, are even easier requiring control of 0 . 0001% of the dataset (e.g., just three out of the 3 million images). Our attacks call into question whether training on noisy and uncurated Internet scrapes is desirable.

Nicholas Carlini | A. Terzis

[1] Aleksander Madry,et al. Label-Consistent Backdoor Attacks , 2019, ArXiv.

[2] Percy Liang,et al. Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[3] Samy Bengio,et al. Large Scale Online Learning of Image Similarity Through Ranking , 2009, J. Mach. Learn. Res..

[4] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.

[5] Stella X. Yu,et al. Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[7] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[9] David A. Shamma,et al. YFCC100M , 2015, Commun. ACM.

[10] Yann LeCun,et al. Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11] Benjamin Recht,et al. Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.

[12] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.

[13] Benjamin Recht,et al. Measuring Robustness to Natural Distribution Shifts in Image Classification , 2020, NeurIPS.

[14] Brendan Dolan-Gavitt,et al. BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[15] Thomas G. Dietterich,et al. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[16] Nicholas Carlini,et al. Poisoning the Unlabeled Dataset of Semi-Supervised Learning , 2021, USENIX Security Symposium.

[17] Yoshua Bengio,et al. Understanding intermediate layers using linear classifier probes , 2016, ICLR.

[18] Blaine Nelson,et al. Can machine learning be secure? , 2006, ASIACCS '06.

[19] Marius Kloft,et al. Security analysis of online centroid anomaly detection , 2010, J. Mach. Learn. Res..

[20] Christopher D. Manning,et al. Contrastive Learning of Medical Visual Representations from Paired Images and Text , 2020, MLHC.

[21] Fei-Fei Li,et al. Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.