Using Data Mining Algorithms for Developing a Model for Intrusion Detection System (IDS)

Abstract A common problem shared by current IDS is the high false positives and low detection rate. An unsupervised machine learning using k-means was used to propose a model for Intrusion Detection System (IDS) with higher efficiency rate and low false positives and false negatives. The NSL-KD data set was used which consisted of 25,192 entries with 22 different types of data. Results of the study using 11, 22, 44, 66 and 88 clusters, showed an efficiency rate of 70.75%, 81.61%, 65.40%, 61.30% and 55.43% respectively; false positive rates of 0.74%, 4.03%, 15.55%, 21.47% and 31.91% respectively; and false negative rates of 99.82%, 98.14%, 97.76%, 96.32% and 95.70%, respectively. Interestingly, the best results were generated when the number of clusters matches the number of data types in the data set. In the light of the findings, it is recommended that other data mining techniques be explored; a study using k-means data mining algorithm followed by signature-based approach is proposed in order to lessen the false negative rate; and a system for automatically identifying the number of clusters may be developed.

[1]  Horst Bischof,et al.  MDL Principle for Robust Vector Quantisation , 1999, Pattern Analysis & Applications.

[2]  Mohammad Khubeb Siddiqui,et al.  Analysis of KDD CUP 99 Dataset using Clustering based Data Mining , 2013 .

[3]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[4]  Xin Xu Adaptive Intrusion Detection Based on Machine Learning: Feature Extraction, Classifier Construction and Sequential Pattern Prediction , 2006 .

[5]  Adetunmbi A. Olusola,et al.  Analysis of KDD '99 Intrusion Detection Dataset for Selection of Relevance Features , 2010 .

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  Deokjai Choi,et al.  Application of Data Mining to Network Intrusion Detection: Classifier Selection Model , 2008, APNOMS.

[8]  Malcolm I. Heywood,et al.  Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 , 2005, PST.

[9]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[10]  Andrew H. Sung,et al.  Intrusion detection using neural networks and support vector machines , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[11]  Salvatore J. Stolfo,et al.  A framework for constructing features and models for intrusion detection systems , 2000, TSEC.

[12]  Salvatore J. Stolfo,et al.  Data Mining Approaches for Intrusion Detection , 1998, USENIX Security Symposium.

[13]  Karen A. Scarfone,et al.  Guide to Intrusion Detection and Prevention Systems (IDPS) , 2007 .

[14]  Peter Mell Understanding Intrusion Detection Systems , 2001 .

[15]  Heikki Mannila,et al.  Discovering Frequent Episodes in Sequences , 1995, KDD.

[16]  T. Lane,et al.  Sequence Matching and Learning in Anomaly Detection for Computer Security , 1997 .