Research Challenges and Performance of Clustering Techniques to Analyze NSL-KDD Dataset

Due to different malicious activities over Internet, there are major challenges to the research community as well as to the corporations. Many data mining techniques have been adopted for this purpose i.e. classification, clustering, association rule mining, regression, visualization etc. For this purpose clustering provides a better representation of network traffic in order to identify the type of data flowing through network. Clustering algorithms have been used most widely as an unsupervised classifier to organize and categorize data. In this paper we have analyzed four different clustering algorithms using NSL-KDD dataset. We tried to cluster the dataset in two classes i.e. normal and anomaly, using Kmeans, EM, DB clustering and COBWEB. The main objective of this evaluation is to determine the class labels of different type of data present in intrusion detection dataset and to find out efficient clustering algorithm. The results of the evaluation are compared and challenges faced in these evaluations are than discussed.

[1]  E. Mizutani,et al.  Neuro-Fuzzy and Soft Computing-A Computational Approach to Learning and Machine Intelligence [Book Review] , 1997, IEEE Transactions on Automatic Control.

[2]  Charles Elkan,et al.  Expectation Maximization Algorithm , 2010, Encyclopedia of Machine Learning.

[3]  Martin Ester,et al.  Density‐based clustering , 2019, WIREs Data Mining Knowl. Discov..

[4]  Shailesh Singh Panwar,et al.  DATA REDUCTION TECHNIQUES TO ANALYZE NSL-KDD DATASET , 2014 .

[5]  Ye Qing,et al.  An intrusion detection approach based on data mining , 2010, 2010 2nd International Conference on Future Computer and Communication.

[6]  N. Hundewale,et al.  An intelligent approach for Intrusion Detection based on data mining techniques , 2012, 2012 International Conference on Multimedia Computing and Systems.

[7]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  Samarjeet Borah,et al.  Performance Analysis of AIM-K-means & K-means in Quality Cluster Generation , 2009, ArXiv.

[10]  W. Yassin,et al.  Intrusion detection based on K-Means clustering and Naïve Bayes classification , 2011, 2011 7th International Conference on Information Technology in Asia.

[11]  Jingtao Yao,et al.  A study on fuzzy intrusion detection , 2005, SPIE Defense + Commercial Sensing.

[12]  Chuen-Tsai Sun,et al.  Neuro-fuzzy And Soft Computing: A Computational Approach To Learning And Machine Intelligence [Books in Brief] , 1997, IEEE Transactions on Neural Networks.

[13]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[14]  Jaideep Srivastava,et al.  Data Mining for Network Intrusion Detection , 2002 .

[15]  M. Dutta,et al.  Performance Analysis of Clustering Methods for Outlier Detection , 2012, 2012 Second International Conference on Advanced Computing & Communication Technologies.

[16]  Salvatore J. Stolfo,et al.  A framework for constructing features and models for intrusion detection systems , 2000, TSEC.

[17]  A. John,et al.  Survey on data mining techniques to enhance intrusion detection , 2012, 2012 International Conference on Computer Communication and Informatics.

[18]  M. Hemalatha,et al.  An evaluation of clustering technique over intrusion detection system , 2012, ICACCI '12.

[19]  Hari Om,et al.  A hybrid system for reducing the false alarm rate of anomaly intrusion detection system , 2012, 2012 1st International Conference on Recent Advances in Information Technology (RAIT).