Certified Defenses for Data Poisoning Attacks

Machine learning systems trained on user-provided data are susceptible to data poisoning attacks, whereby malicious users inject false training data with the aim of corrupting the learned model. While recent work has proposed a number of attacks and defenses, little is understood about the worst-case loss of a defense in the face of a determined attacker. We address this by constructing approximate upper bounds on the loss across a broad family of attacks, for defenders that first perform outlier removal followed by empirical risk minimization. Our approximation relies on two assumptions: (1) that the dataset is large enough for statistical concentration between train and test error to hold, and (2) that outliers within the clean (non-poisoned) data do not have a strong effect on the model. Our bound comes paired with a candidate attack that often nearly matches the upper bound, giving us a powerful tool for quickly assessing defenses on a given dataset. Empirically, we find that even under a simple defense, the MNIST-1-7 and Dogfish datasets are resilient to attack, while in contrast the IMDB sentiment dataset can be driven from 12% to 23% test error by adding only 3% poisoned data.

[1]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[2]  Gregory Valiant,et al.  Learning from untrusted data , 2016, STOC.

[3]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[4]  Santosh S. Vempala,et al.  Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[5]  Gang Wang,et al.  Combating Attacks and Abuse in Large Online Communities , 2016 .

[6]  Shie Mannor,et al.  Robust High Dimensional Sparse Regression and Matching Pursuit , 2013, ArXiv.

[7]  Mark Crovella,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM '04.

[8]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[9]  Tobias Scheffer,et al.  Stackelberg games for adversarial prediction problems , 2011, KDD.

[10]  Blaine Nelson,et al.  The security of machine learning , 2010, Machine Learning.

[11]  Ambuj Tewari,et al.  On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.

[12]  Xiaojin Zhu,et al.  The Security of Latent Dirichlet Allocation , 2015, AISTATS.

[13]  Trac D. Tran,et al.  Exact Recoverability From Dense Corrupted Observations via $\ell _{1}$-Minimization , 2011, IEEE Transactions on Information Theory.

[14]  Fabio Roli,et al.  Security Evaluation of Pattern Classifiers under Attack , 2014, IEEE Transactions on Knowledge and Data Engineering.

[15]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[16]  Murat Kantarcioglu,et al.  Modeling Adversarial Learning as Nested Stackelberg Games , 2016, PAKDD.

[17]  Fabio Roli,et al.  Poisoning behavioral malware clustering , 2014, AISec '14.

[18]  Susmita Sur-Kolay,et al.  Systematic Poisoning Attacks on and Defenses for Machine Learning in Healthcare , 2015, IEEE Journal of Biomedical and Health Informatics.

[19]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[20]  Cristina Nita-Rotaru,et al.  On the Practicality of Integrity Attacks on Document-Level Sentiment Analysis , 2014, AISec '14.

[21]  Insup Lee,et al.  Resilient Linear Classification: An Approach to Deal with Attacks on Training Data , 2017, 2017 ACM/IEEE 8th International Conference on Cyber-Physical Systems (ICCPS).

[22]  Fabio Roli,et al.  Is data clustering in adversarial settings secure? , 2013, AISec.

[23]  Trac D. Tran,et al.  Robust Lasso With Missing and Grossly Corrupted Observations , 2011, IEEE Transactions on Information Theory.

[24]  Claudia Eckert,et al.  Support vector machines under adversarial label contamination , 2015, Neurocomputing.

[25]  Micah Sherr,et al.  Hidden Voice Commands , 2016, USENIX Security Symposium.

[26]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[27]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[28]  Brent Lagesse,et al.  Analysis of Causative Attacks against SVMs Learning from Data Streams , 2017, IWSPA@CODASPY.

[29]  Gregory Valiant,et al.  Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction , 2016, NIPS.

[30]  Ming-Yu Liu,et al.  Tactics of Adversarial Attack on Deep Reinforcement Learning Agents , 2017, IJCAI.

[31]  Jonathan F. Bard,et al.  Practical Bilevel Optimization: Algorithms and Applications , 1998 .

[32]  Fabio Roli,et al.  Poisoning Complete-Linkage Hierarchical Clustering , 2014, S+SSPR.

[33]  Michael P. Wellman,et al.  SoK: Security and Privacy in Machine Learning , 2018, 2018 IEEE European Symposium on Security and Privacy (EuroS&P).

[34]  Salvatore J. Stolfo,et al.  Casting out Demons: Sanitizing Training Data for Anomaly Sensors , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[35]  Prateek Jain,et al.  Robust Regression via Hard Thresholding , 2015, NIPS.

[36]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[37]  Claudia Eckert,et al.  Is Feature Selection Secure against Training Data Poisoning? , 2015, ICML.

[38]  Tobias Scheffer,et al.  Static prediction games for adversarial learning problems , 2012, J. Mach. Learn. Res..

[39]  Vangelis Metsis,et al.  Spam Filtering with Naive Bayes - Which Naive Bayes? , 2006, CEAS.

[40]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[41]  P. Pavani Security Evaluation of Pattern Classifiers under Attack , 2016 .

[42]  Ricky Laishram,et al.  Curie: A method for protecting SVM Classifier from Poisoning Attack , 2016, ArXiv.

[43]  Yiran Chen,et al.  Generative Poisoning Attack Method Against Neural Networks , 2017, ArXiv.

[44]  James Newsome,et al.  Paragraph: Thwarting Signature Learning by Training Maliciously , 2006, RAID.

[45]  Sandy H. Huang,et al.  Adversarial Attacks on Neural Network Policies , 2017, ICLR.

[46]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[47]  Maria-Florina Balcan,et al.  The Power of Localization for Efficiently Learning Linear Separators with Noise , 2013, J. ACM.

[48]  Stephen P. Boyd,et al.  CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..

[49]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[50]  Jeroen B. P. Vuurens,et al.  How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy , 2011 .

[51]  Rocco A. Servedio,et al.  Learning Halfspaces with Malicious Noise , 2009, ICALP.

[52]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[53]  Michael P. Wellman,et al.  Towards the Science of Security and Privacy in Machine Learning , 2016, ArXiv.

[54]  Claudia Eckert,et al.  Adversarial Label Flips Attack on Support Vector Machines , 2012, ECAI.

[55]  J. Lofberg,et al.  YALMIP : a toolbox for modeling and optimization in MATLAB , 2004, 2004 IEEE International Conference on Robotics and Automation (IEEE Cat. No.04CH37508).

[56]  Xiaojin Zhu,et al.  The Teaching Dimension of Linear Learners , 2015, ICML.

[57]  Joseph Gardiner,et al.  On the Security of Machine Learning in Malware C&C Detection , 2016, ACM Comput. Surv..

[58]  Yevgeniy Vorobeychik,et al.  Data Poisoning Attacks on Factorization-Based Collaborative Filtering , 2016, NIPS.

[59]  Xiaojin Zhu,et al.  Using Machine Teaching to Identify Optimal Training-Set Attacks on Machine Learners , 2015, AAAI.

[60]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[61]  Ling Huang,et al.  ANTIDOTE: understanding and defending against poisoning of anomaly detectors , 2009, IMC '09.

[62]  Pavel Laskov,et al.  Practical Evasion of a Learning-Based Classifier: A Case Study , 2014, 2014 IEEE Symposium on Security and Privacy.

[63]  Matt Bishop,et al.  The Art and Science of Computer Security , 2002 .

[64]  Arslan Munir,et al.  Vulnerability of Deep Reinforcement Learning to Policy Induction Attacks , 2017, MLDM.

[65]  Stefan Wager,et al.  The Statistics of Streaming Sparse Regression , 2014, ArXiv.