Fuzzy classification boundaries against adversarial network attacks

Abstract Adversarial machine learning copes with the development of methods to prevent machine learning algorithms from being misled by malicious users. This field is especially relevant for applications where machine learning lies in the core of security systems. In the field of network security, adversarial samples are actually novel network attacks or old attacks with tuned properties. This paper proposes to blur classification boundaries in order to enhance machine learning robustness and improve the detection of adversarial samples that exploit learning weaknesses. We test this concept by an experimental setup with network traffic in which linear decision trees are wrapped by a one-class-membership scoring algorithm. We benchmark our proposal with plain linear decision trees and fuzzy decision trees. Results show that evasive attacks (i.e., false negatives) tend to be ranked with low class-membership levels, meaning that they are located in zones close to classification thresholds. In addition, classification performances improve when membership scores are added as new features. Using fuzzy class boundaries is highly consistent with the interpretation of many network traffic features used for malware detection; moreover, it prevents network attackers from exploiting classification boundaries as attack objectives.

[1]  Graciela Metternicht,et al.  Categorical fuzziness: a comparison between crisp and fuzzy class boundary modelling for mapping salt-affected soils using Landsat TM data and a classification based on anion ratios , 2003 .

[2]  Xizhao Wang,et al.  Tolerance rough fuzzy decision tree , 2018, Inf. Sci..

[3]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[4]  Ryan R. Curtin,et al.  Detecting Adversarial Samples from Artifacts , 2017, ArXiv.

[5]  Tanja Zseby,et al.  Analysis of network traffic features for anomaly detection , 2014, Machine Learning.

[6]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[7]  T. Warren Liao,et al.  II, A fuzzy c-means variant for the generation of fuzzy term sets , 2003, Fuzzy Sets Syst..

[8]  Amir Globerson,et al.  Nightmare at test time: robust learning by feature deletion , 2006, ICML.

[9]  Gianluca Bontempi,et al.  New Routes from Minimal Approximation Error to Principal Components , 2008, Neural Processing Letters.

[10]  Mohamed A. Ismail,et al.  Fuzzy outlier analysis a combined clustering - outlier detection approach , 2007, 2007 IEEE International Conference on Systems, Man and Cybernetics.

[11]  Philippe Golle Machine learning attacks against the Asirra CAPTCHA , 2008, CCS.

[12]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[13]  Christopher Krügel,et al.  Automating Mimicry Attacks Using Static Binary Analysis , 2005, USENIX Security Symposium.

[14]  Jun-Hai Zhai,et al.  Ensemble dropout extreme learning machine via fuzzy integral for data classification , 2018, Neurocomputing.

[15]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..

[16]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[17]  Vern Paxson,et al.  Outside the Closed World: On Using Machine Learning for Network Intrusion Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[18]  David A. Wagner,et al.  Mimicry attacks on host-based intrusion detection systems , 2002, CCS '02.

[19]  Jun-Hai Zhai,et al.  Fuzzy integral-based ELM ensemble for imbalanced big data classification , 2018, Soft Comput..

[20]  Carolin Strobl,et al.  Unbiased split selection for classification trees based on the Gini Index , 2007, Comput. Stat. Data Anal..

[21]  Yalin E. Sagduyu,et al.  Evasion and causative attacks with adversarial deep learning , 2017, MILCOM 2017 - 2017 IEEE Military Communications Conference (MILCOM).

[22]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[23]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[24]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[25]  Ajmal Mian,et al.  Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey , 2018, IEEE Access.

[26]  J. Doug Tygar,et al.  Adversarial machine learning , 2019, AISec '11.

[27]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[28]  Blaine Nelson,et al.  Exploiting Machine Learning to Subvert Your Spam Filter , 2008, LEET.

[29]  Xuxian Jiang,et al.  Stealthy malware detection through vmm-based "out-of-the-box" semantic view reconstruction , 2007, CCS '07.

[30]  Jill Slay,et al.  The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set , 2016, Inf. Secur. J. A Glob. Perspect..

[31]  Paul Ammann,et al.  Using model checking to analyze network vulnerabilities , 2000, Proceeding 2000 IEEE Symposium on Security and Privacy. S&P 2000.

[32]  Louis Wehenkel,et al.  A complete fuzzy decision tree technique , 2003, Fuzzy Sets Syst..

[33]  Yuval Elovici,et al.  Quantifying the resilience of machine learning classifiers used for cyber security , 2018, Expert Syst. Appl..

[34]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2017, Pattern Recognit..

[35]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[36]  Sheng-De Wang,et al.  Fuzzy support vector machines , 2002, IEEE Trans. Neural Networks.

[37]  Antonio Pescapè,et al.  Issues and future directions in traffic classification , 2012, IEEE Network.

[38]  Jiabin Deng,et al.  A New Approach for Decision Tree Based on Principal Component Analysis , 2009, 2009 International Conference on Computational Intelligence and Software Engineering.

[39]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[40]  Sushil Jajodia,et al.  Model-Based Covert Timing Channels: Automated Modeling and Evasion , 2008, RAID.

[41]  David Kennedy,et al.  Metasploit: The Penetration Tester's Guide , 2011 .

[42]  Lotfi A. Zadeh,et al.  Soft computing and fuzzy logic , 1994, IEEE Software.

[43]  Ethem Alpaydin,et al.  Soft decision trees , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[44]  Wei Cai,et al.  A Survey on Security Threats and Defensive Techniques of Machine Learning: A Data Driven View , 2018, IEEE Access.

[45]  Cha Zhang,et al.  Ensemble Machine Learning: Methods and Applications , 2012 .

[46]  Tanja Zseby,et al.  A Meta-Analysis Approach for Feature Selection in Network Traffic Research , 2017, Reproducibility@SIGCOMM.

[47]  Blake Anderson,et al.  Identifying Encrypted Malware Traffic with Contextual Flow Data , 2016, AISec@CCS.