Defending Support Vector Machines Against Data Poisoning Attacks

Support Vector Machines (SVMs) are vulnerable to targeted training data manipulations such as poisoning attacks and label flips. By carefully manipulating a subset of training samples, the attacker forces the learner to compute an incorrect decision boundary, thereby causing misclassifications. Considering the increased importance of SVMs in engineering and life-critical applications, we develop a novel defense algorithm that improves resistance against such attacks. Local Intrinsic Dimensionality (LID) is a promising metric that characterizes the outlierness of data samples. In this work, we introduce a new approximation of LID called K-LID that uses kernel distance in the LID calculation, which allows LID to be calculated in high dimensional transformed spaces. We introduce a weighted SVM against such attacks using K-LID as a distinguishing characteristic that de-emphasizes the effect of suspicious data samples on the SVM decision boundary. Each sample is weighted on how likely its K-LID value is from the benign K-LID distribution rather than the attacked K-LID distribution. Experiments with benchmark data sets show that the proposed defense reduces classification error rates substantially (10% on average).

[1]  Fabio Roli,et al.  Security Evaluation of Pattern Classifiers under Attack , 2014, ArXiv.

[2]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[3]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[4]  Percy Liang,et al.  Certified Defenses for Data Poisoning Attacks , 2017, NIPS.

[5]  Zehang Sun,et al.  On-road vehicle detection using Gabor filters and support vector machines , 2002, 2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628).

[6]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[7]  Dacheng Tao,et al.  Classification with Noisy Labels by Importance Reweighting , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  James Bailey,et al.  Dimensionality-Driven Learning with Noisy Labels , 2018, ICML.

[9]  Mangui Liang,et al.  Fuzzy support vector machine based on within-class scatter for classification problems with outliers or noises , 2013, Neurocomputing.

[10]  Patrick Cardinal,et al.  A Robust Approach for Securing Audio Classification Against Adversarial Attacks , 2019, IEEE Transactions on Information Forensics and Security.

[11]  Claudia Eckert,et al.  Support vector machines under adversarial label contamination , 2015, Neurocomputing.

[12]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Yiming Yang,et al.  Robustness of regularized linear classification methods in text categorization , 2003, SIGIR.

[14]  Pedro M. Domingos,et al.  Adversarial classification , 2004, KDD.

[15]  Michael E. Houle,et al.  Local Intrinsic Dimensionality II: Multivariate Analysis and Distributional Support , 2017, SISAP.

[16]  Naresh Manwani,et al.  Noise Tolerance Under Risk Minimization , 2011, IEEE Transactions on Cybernetics.

[17]  J. Doug Tygar,et al.  Adversarial machine learning , 2019, AISec '11.

[18]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[19]  Yevgeniy Vorobeychik,et al.  A General Retraining Framework for Scalable Adversarial Classification , 2016, ArXiv.

[20]  Michael E. Houle,et al.  Dimensionality, Discriminability, Density and Distance Distributions , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[21]  Stephen M. Omohundro,et al.  Five Balltree Construction Algorithms , 2009 .

[22]  Marimuthu Palaniswami,et al.  Centered Hyperspherical and Hyperellipsoidal One-Class Support Vector Machines for Anomaly Detection in Sensor Networks , 2010, IEEE Transactions on Information Forensics and Security.

[23]  XuLei Yang,et al.  Weighted support vector machine for data classification , 2005 .

[24]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[25]  James Bailey,et al.  The vulnerability of learning to adversarial perturbation increases with intrinsic dimensionality , 2017, 2017 IEEE Workshop on Information Forensics and Security (WIFS).

[26]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[27]  Bhavani M. Thuraisingham,et al.  Adversarial support vector machine learning , 2012, KDD.

[28]  Ken-ichi Kawarabayashi,et al.  Estimating Local Intrinsic Dimensionality , 2015, KDD.

[29]  Michael E. Houle,et al.  Local Intrinsic Dimensionality I: An Extreme-Value-Theoretic Foundation for Similarity Applications , 2017, SISAP.

[30]  András Varga,et al.  An overview of the OMNeT++ simulation environment , 2008, SimuTools.

[31]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Pre-processing for noise detection in gene expression classification data , 2009, Journal of the Brazilian Computer Society.

[32]  James Bailey,et al.  Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[33]  Ricky Laishram,et al.  Curie: A method for protecting SVM Classifier from Poisoning Attack , 2016, ArXiv.

[34]  R. Michael Buehrer,et al.  Evaluating Adversarial Evasion Attacks in the Context of Wireless Communications , 2019, IEEE Transactions on Information Forensics and Security.

[35]  Albert Fornells,et al.  A study of the effect of different types of noise on the precision of supervised learning techniques , 2010, Artificial Intelligence Review.

[36]  Christopher Leckie,et al.  Detection of Anomalous Communications with SDRs and Unsupervised Adversarial Learning , 2018, 2018 IEEE 43rd Conference on Local Computer Networks (LCN).

[37]  Blaine Nelson,et al.  Support Vector Machines Under Adversarial Label Noise , 2011, ACML.