Data sanitization against adversarial label contamination based on data complexity

Machine learning techniques may suffer from adversarial attack in which an attacker misleads a learning process by manipulating training samples. Data sanitization is one of countermeasures against poisoning attack. It is a data pre-processing method which filters suspect samples before learning. Recently, a number of data sanitization methods are devised for label flip attack, but their flexibility is limited due to specific assumptions. It is observed that abrupt label flip caused by attack changes complexity of classification. A data sanitization method based on data complexity, which is a measure of the difficulty of classification on a dataset, is proposed in this paper. Our method measures the data complexity of a training set after removing a sample and its nearest samples. Contaminated samples are then distinguished from untainted samples according to their data complexity values. Experimental results support the idea that data complexity can be used to identify attack samples. The proposed method achieves a better result than the current sanitization method in terms of detection accuracy for well known security application problems.

[1]  Yiyu Yao,et al.  Cost-sensitive three-way email spam filtering , 2013, Journal of Intelligent Information Systems.

[2]  Fabio Roli,et al.  Adversarial attacks against intrusion detection systems: Taxonomy, solutions and open issues , 2013, Inf. Sci..

[3]  Salvatore J. Stolfo,et al.  Casting out Demons: Sanitizing Training Data for Anomaly Sensors , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[4]  FRED W. SMITH,et al.  Pattern Classifier Design by Linear Programming , 1968, IEEE Transactions on Computers.

[5]  Yan Zhou,et al.  A Multiple Instance Learning Strategy for Combating Good Word Attacks on Spam Filters , 2008, J. Mach. Learn. Res..

[6]  Witold Pedrycz,et al.  Quantification of side-channel information leaks based on data complexity measures for web browsing , 2015, Int. J. Mach. Learn. Cybern..

[7]  Honglak Lee,et al.  Spam Deobfuscation using a Hidden Markov Model , 2005, CEAS.

[8]  Francisco Herrera,et al.  Shared domains of competence of approximate learning models using measures of separability of classes , 2012, Inf. Sci..

[9]  Fabio Roli,et al.  Multiple classifier systems for robust classifier design in adversarial environments , 2010, Int. J. Mach. Learn. Cybern..

[10]  Claudia Eckert,et al.  Is Feature Selection Secure against Training Data Poisoning? , 2015, ICML.

[11]  Tobias Scheffer,et al.  Static prediction games for adversarial learning problems , 2012, J. Mach. Learn. Res..

[12]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Blaine Nelson,et al.  The security of machine learning , 2010, Machine Learning.

[14]  Shikha Agrawal,et al.  A Survey on Anomaly Detection in Network Intrusion Detection System Using Particle Swarm Optimization Based Machine Learning Techniques , 2013 .

[15]  Blaine Nelson,et al.  Misleading Learners: Co-opting Your Spam Filter , 2009 .

[16]  Blaine Nelson,et al.  Support Vector Machines Under Adversarial Label Noise , 2011, ACML.

[17]  J. Doug Tygar,et al.  Adversarial machine learning , 2019, AISec '11.

[18]  Claudia Eckert,et al.  Adversarial Label Flips Attack on Support Vector Machines , 2012, ECAI.

[19]  Patrick P. K. Chan,et al.  Spam filtering for short messages in adversarial environment , 2015, Neurocomputing.

[20]  Shyhtsun Felix Wu,et al.  On Attacking Statistical Spam Filters , 2004, CEAS.

[21]  Blaine Nelson,et al.  Can machine learning be secure? , 2006, ASIACCS '06.

[22]  Rocco A. Servedio,et al.  Smooth boosting and learning with malicious noise , 2003 .

[23]  Francisco Herrera,et al.  Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification , 2013, Pattern Recognit..

[24]  Georgios Paliouras,et al.  Spam Filtering: an Active Learning Approach using Incremental Clustering , 2014, WIMS '14.

[25]  Sameer Singh,et al.  Multiresolution Estimates of Classification Complexity , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Lawrence O. Hall,et al.  Label-noise reduction with support vector machines , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[27]  Blaine Nelson,et al.  Exploiting Machine Learning to Subvert Your Spam Filter , 2008, LEET.

[28]  Patrick P. K. Chan,et al.  Adversarial Feature Selection Against Evasion Attacks , 2016, IEEE Transactions on Cybernetics.

[29]  Tin Kam Ho,et al.  Domain of competence of XCS classifier system in complexity measurement space , 2005, IEEE Transactions on Evolutionary Computation.

[30]  Huan Liu,et al.  Book review: Machine Learning, Neural and Statistical Classification Edited by D. Michie, D.J. Spiegelhalter and C.C. Taylor (Ellis Horwood Limited, 1994) , 1996, SGAR.

[31]  Fabio Roli,et al.  Security Evaluation of Pattern Classifiers under Attack , 2014, IEEE Transactions on Knowledge and Data Engineering.

[32]  P. Oscar Boykin,et al.  Collaborative Spam Filtering Using E-Mail Networks , 2006, Computer.

[33]  Christopher Meek,et al.  Adversarial learning , 2005, KDD '05.

[34]  Claudia Eckert,et al.  Support vector machines under adversarial label contamination , 2015, Neurocomputing.

[35]  Shan Suthaharan,et al.  Big data classification: problems and challenges in network intrusion prediction with machine learning , 2014, PERV.

[36]  Pedro M. Domingos,et al.  Adversarial classification , 2004, KDD.

[37]  Latifur Khan,et al.  A Machine Learning Approach to Android Malware Detection , 2012, 2012 European Intelligence and Security Informatics Conference.

[38]  Fabio Roli,et al.  Design of robust classifiers for adversarial environments , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[39]  Fabio Roli,et al.  Bagging Classifiers for Fighting Poisoning Attacks in Adversarial Classification Tasks , 2011, MCS.

[40]  Udam Saini Machine Learning in the Presence of an Adversary: Attacking and Defending the SpamBayes Spam Filter , 2008 .

[41]  Jules White,et al.  Applying machine learning classifiers to dynamic Android malware detection at scale , 2013, 2013 9th International Wireless Communications and Mobile Computing Conference (IWCMC).

[42]  Paul Barford,et al.  Data Poisoning Attacks against Autoregressive Models , 2016, AAAI.

[43]  Yevgeniy Vorobeychik,et al.  Data Poisoning Attacks on Factorization-Based Collaborative Filtering , 2016, NIPS.

[44]  Ling Huang,et al.  ANTIDOTE: understanding and defending against poisoning of anomaly detectors , 2009, IMC '09.

[45]  Amir Globerson,et al.  Nightmare at test time: robust learning by feature deletion , 2006, ICML.