Defending Distributed Classifiers Against Data Poisoning Attacks

Support Vector Machines (SVMs) are vulnerable to targeted training data manipulations such as poisoning attacks and label flips. By carefully manipulating a subset of training samples, the attacker forces the learner to compute an incorrect decision boundary, thereby cause misclassifications. Considering the increased importance of SVMs in engineering and life-critical applications, we develop a novel defense algorithm that improves resistance against such attacks. Local Intrinsic Dimensionality (LID) is a promising metric that characterizes the outlierness of data samples. In this work, we introduce a new approximation of LID called K-LID that uses kernel distance in the LID calculation, which allows LID to be calculated in high dimensional transformed spaces. We introduce a weighted SVM against such attacks using K-LID as a distinguishing characteristic that de-emphasizes the effect of suspicious data samples on the SVM decision boundary. Each sample is weighted on how likely its K-LID value is from the benign K-LID distribution rather than the attacked K-LID distribution. We then demonstrate how the proposed defense can be applied to a distributed SVM framework through a case study on an SDR-based surveillance system. Experiments with benchmark data sets show that the proposed defense reduces classification error rates substantially (10% on average).

[1]  James Bailey,et al.  Dimensionality-Driven Learning with Noisy Labels , 2018, ICML.

[2]  Bhavani M. Thuraisingham,et al.  Adversarial support vector machine learning , 2012, KDD.

[3]  Percy Liang,et al.  Certified Defenses for Data Poisoning Attacks , 2017, NIPS.

[4]  Albert Fornells,et al.  A study of the effect of different types of noise on the precision of supervised learning techniques , 2010, Artificial Intelligence Review.

[5]  James Bailey,et al.  The vulnerability of learning to adversarial perturbation increases with intrinsic dimensionality , 2017, 2017 IEEE Workshop on Information Forensics and Security (WIFS).

[6]  Christian Bauckhage,et al.  A distributed machine learning framework , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[7]  Rui Zhang,et al.  A game-theoretic analysis of label flipping attacks on distributed support vector machines , 2017, 2017 51st Annual Conference on Information Sciences and Systems (CISS).

[8]  Ken-ichi Kawarabayashi,et al.  Estimating Local Intrinsic Dimensionality , 2015, KDD.

[9]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Michael E. Houle,et al.  Local Intrinsic Dimensionality II: Multivariate Analysis and Distributional Support , 2017, SISAP.

[11]  Dacheng Tao,et al.  Classification with Noisy Labels by Importance Reweighting , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Fabio Roli,et al.  Security Evaluation of Pattern Classifiers under Attack , 2014, ArXiv.

[13]  Zehang Sun,et al.  On-road vehicle detection using Gabor filters and support vector machines , 2002, 2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628).

[14]  Naresh Manwani,et al.  Noise Tolerance Under Risk Minimization , 2011, IEEE Transactions on Cybernetics.

[15]  Yan Zhou,et al.  Distributed support vector machines: An overview , 2012, 2012 24th Chinese Control and Decision Conference (CCDC).

[16]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[17]  Pedro M. Domingos,et al.  Adversarial classification , 2004, KDD.

[18]  Yiming Yang,et al.  Robustness of regularized linear classification methods in text categorization , 2003, SIGIR.

[19]  Marimuthu Palaniswami,et al.  Centered Hyperspherical and Hyperellipsoidal One-Class Support Vector Machines for Anomaly Detection in Sensor Networks , 2010, IEEE Transactions on Information Forensics and Security.

[20]  Yevgeniy Vorobeychik,et al.  A General Retraining Framework for Scalable Adversarial Classification , 2016, ArXiv.

[21]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[22]  Claudia Eckert,et al.  Support vector machines under adversarial label contamination , 2015, Neurocomputing.

[23]  Georgios B. Giannakis,et al.  Consensus-Based Distributed Support Vector Machines , 2010, J. Mach. Learn. Res..

[24]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[25]  András Varga,et al.  An overview of the OMNeT++ simulation environment , 2008, SimuTools.

[26]  Patrick Cardinal,et al.  A Robust Approach for Securing Audio Classification Against Adversarial Attacks , 2019, IEEE Transactions on Information Forensics and Security.

[27]  James Bailey,et al.  Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[28]  Mangui Liang,et al.  Fuzzy support vector machine based on within-class scatter for classification problems with outliers or noises , 2013, Neurocomputing.

[29]  Ricky Laishram,et al.  Curie: A method for protecting SVM Classifier from Poisoning Attack , 2016, ArXiv.

[30]  Blaine Nelson,et al.  Support Vector Machines Under Adversarial Label Noise , 2011, ACML.

[31]  Michael E. Houle,et al.  Dimensionality, Discriminability, Density and Distance Distributions , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[32]  J. Doug Tygar,et al.  Adversarial machine learning , 2019, AISec '11.

[33]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[34]  XuLei Yang,et al.  Weighted support vector machine for data classification , 2005 .

[35]  R. Michael Buehrer,et al.  Evaluating Adversarial Evasion Attacks in the Context of Wireless Communications , 2019, IEEE Transactions on Information Forensics and Security.

[36]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[37]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Pre-processing for noise detection in gene expression classification data , 2009, Journal of the Brazilian Computer Society.

[38]  Michael E. Houle,et al.  Local Intrinsic Dimensionality I: An Extreme-Value-Theoretic Foundation for Similarity Applications , 2017, SISAP.