论文信息 - Certified Defenses for Data Poisoning Attacks

Certified Defenses for Data Poisoning Attacks

Machine learning systems trained on user-provided data are susceptible to data poisoning attacks, whereby malicious users inject false training data with the aim of corrupting the learned model. While recent work has proposed a number of attacks and defenses, little is understood about the worst-case loss of a defense in the face of a determined attacker. We address this by constructing approximate upper bounds on the loss across a broad family of attacks, for defenders that first perform outlier removal followed by empirical risk minimization. Our approximation relies on two assumptions: (1) that the dataset is large enough for statistical concentration between train and test error to hold, and (2) that outliers within the clean (non-poisoned) data do not have a strong effect on the model. Our bound comes paired with a candidate attack that often nearly matches the upper bound, giving us a powerful tool for quickly assessing defenses on a given dataset. Empirically, we find that even under a simple defense, the MNIST-1-7 and Dogfish datasets are resilient to attack, while in contrast the IMDB sentiment dataset can be driven from 12% to 23% test error by adding only 3% poisoned data.

Percy Liang | Jacob Steinhardt | Pang Wei Koh | Percy Liang | J. Steinhardt

[1] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[2] Gregory Valiant,et al. Learning from untrusted data , 2016, STOC.

[3] Samy Bengio,et al. Adversarial examples in the physical world , 2016, ICLR.

[4] Santosh S. Vempala,et al. Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[5] Gang Wang,et al. Combating Attacks and Abuse in Large Online Communities , 2016 .

[6] Shie Mannor,et al. Robust High Dimensional Sparse Regression and Matching Pursuit , 2013, ArXiv.

[7] Mark Crovella,et al. Diagnosing network-wide traffic anomalies , 2004, SIGCOMM '04.

[8] Lin Xiao,et al. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[9] Tobias Scheffer,et al. Stackelberg games for adversarial prediction problems , 2011, KDD.

[10] Blaine Nelson,et al. The security of machine learning , 2010, Machine Learning.

[11] Ambuj Tewari,et al. On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.

[12] Xiaojin Zhu,et al. The Security of Latent Dirichlet Allocation , 2015, AISTATS.

[13] Trac D. Tran,et al. Exact Recoverability From Dense Corrupted Observations via $\ell _{1}$-Minimization , 2011, IEEE Transactions on Information Theory.

[14] Fabio Roli,et al. Security Evaluation of Pattern Classifiers under Attack , 2014, IEEE Transactions on Knowledge and Data Engineering.

[15] Jos F. Sturm,et al. A Matlab toolbox for optimization over symmetric cones , 1999 .

[16] Murat Kantarcioglu,et al. Modeling Adversarial Learning as Nested Stackelberg Games , 2016, PAKDD.

[17] Fabio Roli,et al. Poisoning behavioral malware clustering , 2014, AISec '14.

[18] Susmita Sur-Kolay,et al. Systematic Poisoning Attacks on and Defenses for Machine Learning in Healthcare , 2015, IEEE Journal of Biomedical and Health Informatics.

[19] Fan Zhang,et al. Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[20] Cristina Nita-Rotaru,et al. On the Practicality of Integrity Attacks on Document-Level Sentiment Analysis , 2014, AISec '14.

[21] Insup Lee,et al. Resilient Linear Classification: An Approach to Deal with Attacks on Training Data , 2017, 2017 ACM/IEEE 8th International Conference on Cyber-Physical Systems (ICCPS).

[22] Fabio Roli,et al. Is data clustering in adversarial settings secure? , 2013, AISec.

[23] Trac D. Tran,et al. Robust Lasso With Missing and Grossly Corrupted Observations , 2011, IEEE Transactions on Information Theory.

[24] Claudia Eckert,et al. Support vector machines under adversarial label contamination , 2015, Neurocomputing.

[25] Micah Sherr,et al. Hidden Voice Commands , 2016, USENIX Security Symposium.

[26] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[27] Patrick D. McDaniel,et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[28] Brent Lagesse,et al. Analysis of Causative Attacks against SVMs Learning from Data Streams , 2017, IWSPA@CODASPY.

[29] Gregory Valiant,et al. Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction , 2016, NIPS.

[30] Ming-Yu Liu,et al. Tactics of Adversarial Attack on Deep Reinforcement Learning Agents , 2017, IJCAI.

[31] Jonathan F. Bard,et al. Practical Bilevel Optimization: Algorithms and Applications , 1998 .

[32] Fabio Roli,et al. Poisoning Complete-Linkage Hierarchical Clustering , 2014, S+SSPR.

[33] Michael P. Wellman,et al. SoK: Security and Privacy in Machine Learning , 2018, 2018 IEEE European Symposium on Security and Privacy (EuroS&P).

[34] Salvatore J. Stolfo,et al. Casting out Demons: Sanitizing Training Data for Anomaly Sensors , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[35] Prateek Jain,et al. Robust Regression via Hard Thresholding , 2015, NIPS.

[36] Percy Liang,et al. Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[37] Claudia Eckert,et al. Is Feature Selection Secure against Training Data Poisoning? , 2015, ICML.

[38] Tobias Scheffer,et al. Static prediction games for adversarial learning problems , 2012, J. Mach. Learn. Res..

[39] Vangelis Metsis,et al. Spam Filtering with Naive Bayes - Which Naive Bayes? , 2006, CEAS.

[40] Blaine Nelson,et al. Poisoning Attacks against Support Vector Machines , 2012, ICML.

[41] P. Pavani. Security Evaluation of Pattern Classifiers under Attack , 2016 .

[42] Ricky Laishram,et al. Curie: A method for protecting SVM Classifier from Poisoning Attack , 2016, ArXiv.

[43] Yiran Chen,et al. Generative Poisoning Attack Method Against Neural Networks , 2017, ArXiv.

[44] James Newsome,et al. Paragraph: Thwarting Signature Learning by Training Maliciously , 2006, RAID.

[45] Sandy H. Huang,et al. Adversarial Attacks on Neural Network Policies , 2017, ICLR.

[46] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[47] Maria-Florina Balcan,et al. The Power of Localization for Efficiently Learning Linear Separators with Noise , 2013, J. ACM.

[48] Stephen P. Boyd,et al. CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..

[49] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[50] Jeroen B. P. Vuurens,et al. How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy , 2011 .

[51] Rocco A. Servedio,et al. Learning Halfspaces with Malicious Noise , 2009, ICALP.

[52] Daniel M. Kane,et al. Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[53] Michael P. Wellman,et al. Towards the Science of Security and Privacy in Machine Learning , 2016, ArXiv.

[54] Claudia Eckert,et al. Adversarial Label Flips Attack on Support Vector Machines , 2012, ECAI.

[55] J. Lofberg,et al. YALMIP : a toolbox for modeling and optimization in MATLAB , 2004, 2004 IEEE International Conference on Robotics and Automation (IEEE Cat. No.04CH37508).

[56] Xiaojin Zhu,et al. The Teaching Dimension of Linear Learners , 2015, ICML.

[57] Joseph Gardiner,et al. On the Security of Machine Learning in Malware C&C Detection , 2016, ACM Comput. Surv..

[58] Yevgeniy Vorobeychik,et al. Data Poisoning Attacks on Factorization-Based Collaborative Filtering , 2016, NIPS.

[59] Xiaojin Zhu,et al. Using Machine Teaching to Identify Optimal Training-Set Attacks on Machine Learners , 2015, AAAI.

[60] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[61] Ling Huang,et al. ANTIDOTE: understanding and defending against poisoning of anomaly detectors , 2009, IMC '09.

[62] Pavel Laskov,et al. Practical Evasion of a Learning-Based Classifier: A Case Study , 2014, 2014 IEEE Symposium on Security and Privacy.

[63] Matt Bishop,et al. The Art and Science of Computer Security , 2002 .

[64] Arslan Munir,et al. Vulnerability of Deep Reinforcement Learning to Policy Induction Attacks , 2017, MLDM.

[65] Stefan Wager,et al. The Statistics of Streaming Sparse Regression , 2014, ArXiv.