Network intrusion detection using machine learning anomaly detection algorithms

Attacks on the network are exceptional cases that are not observed in normal traffic behavior. In this work, in order to detect network attacks, using k-means algorithm a new semi-supervised anomaly detection system has been designed and implemented. During the training phase, normal samples were separated into clusters by applying k-means algorithm. Then, in order to be able to distinguish between normal and abnormal samples — according to their distances from the clusters' centers and using a validation dataset-a threshold value was calculated. New samples that are far from the clusters' centers more than the threshold value is detected as anomalies. We used NSL-KDD — a labelled dataset of network connection traces-for testing our method's effectiveness. The experiments result on the NSL-KDD data set, shows that we achieved an accuracy of 80.119%.

[1]  Mohammad Zulkernine,et al.  Anomaly Based Network Intrusion Detection with Unsupervised Outlier Detection , 2006, 2006 IEEE International Conference on Communications.

[2]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[3]  Patricio A. Vela,et al.  A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm , 2012, Expert Syst. Appl..

[4]  Mohiuddin Ahmed,et al.  A survey of network anomaly detection techniques , 2016, J. Netw. Comput. Appl..

[5]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[6]  Mohamed Guerroumi,et al.  Semi-supervised Statistical Approach for Network Anomaly Detection , 2016, ANT/SEIT.

[7]  Georg Carle,et al.  Traffic Anomaly Detection Using K-Means Clustering , 2007 .

[8]  Jose F. Nieves,et al.  Data Clustering for Anomaly Detection in Network Intrusion Detection , 2009 .

[9]  Zhiping Cai,et al.  A Misleading Attack against Semi-supervised Learning for Intrusion Detection , 2010, MDAI.

[10]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[11]  Leonid Portnoy,et al.  Intrusion detection with unlabeled data using clustering , 2000 .

[12]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[13]  Christopher Krügel,et al.  Using Decision Trees to Improve Signature-Based Intrusion Detection , 2003, RAID.

[14]  Sanmeet Kaur,et al.  Comparative Analysis of Anomaly Based and Signature Based Intrusion Detection Systems Using PHAD and Snort , 2013 .

[15]  Padhraic Smyth,et al.  Knowledge Discovery and Data Mining: Towards a Unifying Framework , 1996, KDD.

[16]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..