Clustering based semi-supervised machine learning for DDoS attack classification

Abstract Semi-supervised machine learning can be used for obtaining subsets of unlabeled or partially labeled dataset based on the applicable metrics of dissimilarity. At later stage, the data is completely assigned the labels as per the observed differentiation. This paper provides a clustering based approach to distinguish the data representing flows of network traffic which include both normal and Distributed Denial of Service (DDoS) traffic. The features are taken for victim-end identification of attacks and the work is demonstrated with three features which can be monitored at the target machine. The clustering methods include agglomerative and K-means with feature extraction under Principal Component Analysis (PCA). A voting method is also proposed to label the data and obtain classes to distinguish attacks from normal traffic. After labeling, supervised machine learning algorithms of k-Nearest Neighbors (kNN), Support Vector Machine (SVM) and Random Forest (RF) are applied to obtain the trained models for future classification. The kNN, SVM and RF models in experimental results provide 95%, 92% and 96.66% accuracy scores respectively under optimized parameter tuning within given sets of values. In the end, the scheme is also validated using a subset of benchmark dataset with new vectors of attack.

[1]  Jinoh Kim,et al.  Multivariate network traffic analysis using clustered patterns , 2018, Computing.

[2]  Ali A. Ghorbani,et al.  Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization , 2018, ICISSP.

[3]  Muhammad Aamir,et al.  Denial-of-service in content centric (named data) networking: a tutorial and state-of-the-art survey , 2015, Secur. Commun. Networks.

[4]  R. Anitha,et al.  Botnet detection via mining of traffic flow characteristics , 2016, Comput. Electr. Eng..

[5]  Vitaly Klyuev,et al.  Development of a network intrusion detection system using Apache Hadoop and Spark , 2017, 2017 IEEE Conference on Dependable and Secure Computing.

[6]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[7]  William Stafford Noble,et al.  Support vector machine , 2013 .

[8]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[9]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[10]  Muhammad Aamir,et al.  A Survey on DDoS Attack and Defense Strategies: From Traditional Schemes to Current Techniques , 2013 .

[11]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  Saeed Ayat,et al.  A robust ensemble of neuro-fuzzy classifiers for DDoS attack detection , 2013, Proceedings of 2013 3rd International Conference on Computer Science and Network Technology.

[14]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Kang G. Shin,et al.  Measurement and analysis of global IP-usage patterns of fast-flux botnets , 2011, 2011 Proceedings IEEE INFOCOM.

[16]  Geert Deconinck,et al.  Analyzing well-known countermeasures against distributed denial of service attacks , 2012, Comput. Commun..

[17]  Karim Afdel,et al.  Semi-supervised machine learning approach for DDoS detection , 2018, Applied Intelligence.

[18]  Yonghao Gu,et al.  Multiple-Features-Based Semisupervised Clustering DDoS Detection Method , 2017 .

[19]  Alberto Dainotti,et al.  Millions of targets under attack: a macroscopic characterization of the DoS ecosystem , 2017, Internet Measurement Conference.

[20]  Qian Du,et al.  Low-Complexity Principal Component Analysis for Hyperspectral Image Compression , 2008, Int. J. High Perform. Comput. Appl..

[21]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[22]  Vitaly Klyuev,et al.  An Intelligent DDoS Attack Detection System Using Packet Analysis and Support Vector Machine , 2014 .

[23]  Hisashi Koga,et al.  Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing , 2007, Knowledge and Information Systems.

[24]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .