论文信息 - Defending Distributed Classifiers Against Data Poisoning Attacks

Defending Distributed Classifiers Against Data Poisoning Attacks

Support Vector Machines (SVMs) are vulnerable to targeted training data manipulations such as poisoning attacks and label flips. By carefully manipulating a subset of training samples, the attacker forces the learner to compute an incorrect decision boundary, thereby cause misclassifications. Considering the increased importance of SVMs in engineering and life-critical applications, we develop a novel defense algorithm that improves resistance against such attacks. Local Intrinsic Dimensionality (LID) is a promising metric that characterizes the outlierness of data samples. In this work, we introduce a new approximation of LID called K-LID that uses kernel distance in the LID calculation, which allows LID to be calculated in high dimensional transformed spaces. We introduce a weighted SVM against such attacks using K-LID as a distinguishing characteristic that de-emphasizes the effect of suspicious data samples on the SVM decision boundary. Each sample is weighted on how likely its K-LID value is from the benign K-LID distribution rather than the attacked K-LID distribution. We then demonstrate how the proposed defense can be applied to a distributed SVM framework through a case study on an SDR-based surveillance system. Experiments with benchmark data sets show that the proposed defense reduces classification error rates substantially (10% on average).

[1] James Bailey,et al. Dimensionality-Driven Learning with Noisy Labels , 2018, ICML.

[2] Bhavani M. Thuraisingham,et al. Adversarial support vector machine learning , 2012, KDD.

[3] Percy Liang,et al. Certified Defenses for Data Poisoning Attacks , 2017, NIPS.

[4] Albert Fornells,et al. A study of the effect of different types of noise on the precision of supervised learning techniques , 2010, Artificial Intelligence Review.

[5] James Bailey,et al. The vulnerability of learning to adversarial perturbation increases with intrinsic dimensionality , 2017, 2017 IEEE Workshop on Information Forensics and Security (WIFS).

[6] Christian Bauckhage,et al. A distributed machine learning framework , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[7] Rui Zhang,et al. A game-theoretic analysis of label flipping attacks on distributed support vector machines , 2017, 2017 51st Annual Conference on Information Sciences and Systems (CISS).

[8] Ken-ichi Kawarabayashi,et al. Estimating Local Intrinsic Dimensionality , 2015, KDD.

[9] M. Verleysen,et al. Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[10] Michael E. Houle,et al. Local Intrinsic Dimensionality II: Multivariate Analysis and Distributional Support , 2017, SISAP.

[11] Dacheng Tao,et al. Classification with Noisy Labels by Importance Reweighting , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Fabio Roli,et al. Security Evaluation of Pattern Classifiers under Attack , 2014, ArXiv.

[13] Zehang Sun,et al. On-road vehicle detection using Gabor filters and support vector machines , 2002, 2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628).

[14] Naresh Manwani,et al. Noise Tolerance Under Risk Minimization , 2011, IEEE Transactions on Cybernetics.

[15] Yan Zhou,et al. Distributed support vector machines: An overview , 2012, 2012 24th Chinese Control and Decision Conference (CCDC).

[16] Nagarajan Natarajan,et al. Learning with Noisy Labels , 2013, NIPS.

[17] Pedro M. Domingos,et al. Adversarial classification , 2004, KDD.

[18] Yiming Yang,et al. Robustness of regularized linear classification methods in text categorization , 2003, SIGIR.

[19] Marimuthu Palaniswami,et al. Centered Hyperspherical and Hyperellipsoidal One-Class Support Vector Machines for Anomaly Detection in Sensor Networks , 2010, IEEE Transactions on Information Forensics and Security.

[20] Yevgeniy Vorobeychik,et al. A General Retraining Framework for Scalable Adversarial Classification , 2016, ArXiv.

[21] Bernard W. Silverman,et al. Density Estimation for Statistics and Data Analysis , 1987 .

[22] Claudia Eckert,et al. Support vector machines under adversarial label contamination , 2015, Neurocomputing.

[23] Georgios B. Giannakis,et al. Consensus-Based Distributed Support Vector Machines , 2010, J. Mach. Learn. Res..

[24] Johan A. K. Suykens,et al. Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[25] András Varga,et al. An overview of the OMNeT++ simulation environment , 2008, SimuTools.

[26] Patrick Cardinal,et al. A Robust Approach for Securing Audio Classification Against Adversarial Attacks , 2019, IEEE Transactions on Information Forensics and Security.

[27] James Bailey,et al. Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[28] Mangui Liang,et al. Fuzzy support vector machine based on within-class scatter for classification problems with outliers or noises , 2013, Neurocomputing.

[29] Ricky Laishram,et al. Curie: A method for protecting SVM Classifier from Poisoning Attack , 2016, ArXiv.

[30] Blaine Nelson,et al. Support Vector Machines Under Adversarial Label Noise , 2011, ACML.

[31] Michael E. Houle,et al. Dimensionality, Discriminability, Density and Distance Distributions , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[32] J. Doug Tygar,et al. Adversarial machine learning , 2019, AISec '11.

[33] P. J. Green,et al. Density Estimation for Statistics and Data Analysis , 1987 .

[34] XuLei Yang,et al. Weighted support vector machine for data classification , 2005 .

[35] R. Michael Buehrer,et al. Evaluating Adversarial Evasion Attacks in the Context of Wireless Communications , 2019, IEEE Transactions on Information Forensics and Security.

[36] Blaine Nelson,et al. Poisoning Attacks against Support Vector Machines , 2012, ICML.

[37] André Carlos Ponce de Leon Ferreira de Carvalho,et al. Pre-processing for noise detection in gene expression classification data , 2009, Journal of the Brazilian Computer Society.

[38] Michael E. Houle,et al. Local Intrinsic Dimensionality I: An Extreme-Value-Theoretic Foundation for Similarity Applications , 2017, SISAP.