Comparison and Detection Analysis of Network Traffic Datasets Using K-Means Clustering Algorithm

Anomaly detection in specific datasets involves the detection of circumstances that are common in a homogeneous data. When looking at network traffic data, it is generally difficult to determine the type of attack without proper analysis and this holds true when simply viewing a record of network traffic with thousands of internet users to detect malicious activity. However, there are different types of datasets in light of the way they record or acquire data and facts. The paper aims to compare and analyse multiple datasets including NSL-KDD and MAWI by using K-means clustering algorithm. Specifically, the paper analyses the blind-Spots of the datasets and evaluates the most suitable dataset for K-means clustering algorithm. This paper’s quantitative data analysis results are helpful in evaluating weaknesses of each dataset of traffic data, when using K-means clustering algorithm.

[1]  Christopher Krügel,et al.  Detection and analysis of drive-by-download attacks and malicious JavaScript code , 2010, WWW '10.

[2]  N. Hundewale,et al.  An intelligent approach for Intrusion Detection based on data mining techniques , 2012, 2012 International Conference on Multimedia Computing and Systems.

[3]  Jian Weng,et al.  Feature selection for text classification: A review , 2018, Multimedia Tools and Applications.

[4]  Kewei Cheng,et al.  Feature Selection , 2016, ACM Comput. Surv..

[5]  W. Yassin,et al.  Intrusion detection based on K-Means clustering and Naïve Bayes classification , 2011, 2011 7th International Conference on Information Technology in Asia.

[6]  Abdul Azim Abdul Ghani,et al.  An unsupervised heterogeneous log-based framework for anomaly detection , 2016 .

[7]  Marcelo Seido Nagano,et al.  A constructive evolutionary approach for feature selection in unsupervised learning , 2018, Swarm Evol. Comput..

[8]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[9]  Hui-Huang Hsu,et al.  Feature Selection via Correlation Coefficient Clustering , 2010, J. Softw..

[10]  Gabriel Maciá-Fernández,et al.  Anomaly-based network intrusion detection: Techniques, systems and challenges , 2009, Comput. Secur..

[11]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[12]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[13]  Aurobindo Sundaram,et al.  An introduction to intrusion detection , 1996, CROS.

[14]  Jeff G. Schneider,et al.  Detecting anomalous records in categorical datasets , 2007, KDD '07.

[15]  Marcelo Seido Nagano,et al.  Optimization of the Numeric and Categorical Attribute Weights in KAMILA Mixed Data Clustering Algorithm , 2019, IDEAL.

[16]  Omar Ismael Al-Sanjary,et al.  Performance of Hospital Information System in Malaysian Public Hospital: a Review , 2018 .

[17]  Siddhartha Bhattacharyya,et al.  A group incremental feature selection for classification using rough set theory based genetic algorithm , 2018, Appl. Soft Comput..

[18]  Dimitrios Gunopulos,et al.  Iterative Incremental Clustering of Time Series , 2004, EDBT.

[19]  Shijun Yi,et al.  Research of Network Intrusion-Detection System Based on Data Mining , 2012 .

[20]  Arshad Jamal,et al.  Effectiveness of Artificial Neural Networks in Solving Financial Time Series , 2018, International Journal of Engineering & Technology.

[21]  Richard A. Berk Classification and Regression Trees (CART) , 2008 .

[22]  K Raghuveer,et al.  Performance evaluation of data clustering techniques using KDD Cup-99 Intrusion detection data set , 2012 .

[23]  Tak-Chung Fu,et al.  Agent-based network intrusion detection system using data mining approaches , 2005, Third International Conference on Information Technology and Applications (ICITA'05).

[24]  Rabab Alayham Abbas Helmi,et al.  Improving Time Series' Forecast Errors by Using Recurrent Neural Networks , 2018, ICSCA.

[25]  A. John,et al.  Survey on data mining techniques to enhance intrusion detection , 2012, 2012 International Conference on Computer Communication and Informatics.

[26]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[27]  Marcelo Seido Nagano,et al.  Unsupervised feature selection based on bio-inspired approaches , 2020, Swarm Evol. Comput..

[28]  Omar Ismael Al-Sanjary,et al.  A Review of Simulation Urban Growth Model , 2018, International Journal of Engineering & Technology.

[29]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  B. Murrell,et al.  RDP4: Detection and analysis of recombination patterns in virus genomes , 2015, Virus evolution.

[31]  Sergio M. Savaresi,et al.  Unsupervised learning techniques for an intrusion detection system , 2004, SAC '04.