Why Does Regularization Help with Mitigating Poisoning Attacks?

We use distributionally-robust optimization for machine learning to mitigate the effect of data poisoning attacks. We provide performance guarantees for the trained model on the original data (not including the poison records) by training the model for the worst-case distribution on a neighbourhood around the empirical distribution (extracted from the training dataset corrupted by a poisoning attack) defined using the Wasserstein distance. We relax the distributionally-robust machine learning problem by finding an upper bound for the worst-case fitness based on the empirical sampled-averaged fitness and the Lipschitz-constant of the fitness function (on the data for given model parameters) as regularizer. For regression models, we prove that this regularizer is equal to the dual norm of the model parameters.

[1]  Leslie G. Valiant,et al.  Learning Disjunction of Conjunctions , 1985, IJCAI.

[2]  Hossein Mobahi,et al.  Learning with a Wasserstein Loss , 2015, NIPS.

[3]  G. Pflug,et al.  Multistage Stochastic Optimization , 2014 .

[4]  Rocco A. Servedio,et al.  Learning Halfspaces with Malicious Noise , 2009, ICALP.

[5]  Blaine Nelson,et al.  The security of machine learning , 2010, Machine Learning.

[6]  A. Guillin,et al.  On the rate of convergence in Wasserstein distance of the empirical measure , 2013, 1312.2128.

[7]  Reza Samavi,et al.  Mixed Strategy Game Model Against Data Poisoning Attacks , 2019, 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W).

[8]  Viet Anh Nguyen,et al.  Wasserstein Distributionally Robust Optimization: Theory and Applications in Machine Learning , 2019, Operations Research & Management Science in the Age of Analytics.

[9]  M. KarthyekRajhaaA.,et al.  Robust Wasserstein profile inference and applications to machine learning , 2019, J. Appl. Probab..

[10]  Fabio Roli,et al.  Infinity-Norm Support Vector Machines Against Adversarial Label Contamination , 2017, ITASEC.

[11]  Blaine Nelson,et al.  Support Vector Machines Under Adversarial Label Noise , 2011, ACML.

[12]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[13]  Michael Unser,et al.  Deep Neural Networks With Trainable Activations and Controlled Lipschitz Constant , 2020, IEEE Transactions on Signal Processing.

[14]  Rocco A. Servedio,et al.  Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[15]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[16]  Adam Prügel-Bennett,et al.  The Probability Companion for Engineering and Computer Science , 2020 .

[17]  Eyal Kushilevitz,et al.  PAC learning with nasty noise , 1999, Theor. Comput. Sci..

[18]  Rocco A. Servedio,et al.  Smooth boosting and learning with malicious noise , 2003 .

[19]  Murat Kantarcioglu,et al.  Adversarial Machine Learning , 2018, Adversarial Machine Learning.

[20]  B. Reddy,et al.  Introductory Functional Analysis , 1998 .

[21]  Farhad Farokhi A Game-Theoretic Approach to Adversarial Linear Support Vector Classification , 2019, ArXiv.

[22]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[23]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[24]  B. Reddy,et al.  Introductory Functional Analysis: With Applications to Boundary Value Problems and Finite Elements , 1997 .

[25]  O. Catoni Challenging the empirical mean and empirical variance: a deviation study , 2010, 1009.2048.

[26]  Quanyan Zhu,et al.  A game-theoretic defense against data poisoning attacks in distributed support vector machines , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[27]  Daniel Kuhn,et al.  Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations , 2015, Mathematical Programming.

[28]  W. S. Hall,et al.  The Mean Value Theorem for Vector Valued Functions: A Simple Proof , 1979 .

[29]  Ming Li,et al.  Learning in the Presence of Malicious Errors , 1993, SIAM J. Comput..

[30]  G. Lugosi,et al.  Empirical risk minimization for heavy-tailed losses , 2014, 1406.2462.

[31]  Salvatore J. Stolfo,et al.  Casting out Demons: Sanitizing Training Data for Anomaly Sensors , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[32]  Bo Li,et al.  Evasion-Robust Classification on Binary Domains , 2018, ACM Trans. Knowl. Discov. Data.

[33]  Yevgeniy Vorobeychik,et al.  Feature Cross-Substitution in Adversarial Classification , 2014, NIPS.

[34]  D. Rubinfeld,et al.  Hedonic housing prices and the demand for clean air , 1978 .

[35]  Daniel Kuhn,et al.  A comment on “computational complexity of stochastic programming problems” , 2016, Math. Program..

[36]  Chang Liu,et al.  Robust Linear Regression Against Training Data Poisoning , 2017, AISec@CCS.

[37]  Percy Liang,et al.  Certified Defenses for Data Poisoning Attacks , 2017, NIPS.