A Hybrid Data Mining Approach for Intrusion Detection on Imbalanced NSL-KDD Dataset

Intrusion detection systems aim to detect malicious viruses from computer and network traffic, which is not possible using common firewall. Most intrusion detection systems are developed based on machine learning techniques. Since datasets which used in intrusion detection are imbalanced, in the previous methods, the accuracy of detecting two attack classes, R2L and U2R, is lower than that of the normal and other attack classes. In order to overcome this issue, this study employs a hybrid approach. This hybrid approach is a combination of synthetic minority oversampling technique (SMOTE) and cluster center and nearest neighbor (CANN). Important features are selected using leave one out method (LOO). Moreover, this study employs NSL KDD dataset. Results indicate that the proposed method improves the accuracy of detecting U2R and R2L attacks in comparison to the baseline paper by 94% and 50%, respectively

[1]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[2]  Antonio Martínez-Álvarez,et al.  Feature selection by multi-objective optimisation: Application to network anomaly detection by hierarchical self-organising maps , 2014, Knowl. Based Syst..

[3]  Jian Ma,et al.  A new approach to intrusion detection using Artificial Neural Networks and fuzzy clustering , 2010, Expert Syst. Appl..

[4]  Chih-Fong Tsai,et al.  CANN: An intrusion detection system based on combining cluster centers and nearest neighbors , 2015, Knowl. Based Syst..

[5]  Verónica Bolón-Canedo,et al.  Feature selection and classification in multiple class datasets: An application to KDD Cup 99 dataset , 2011, Expert Syst. Appl..

[6]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[7]  Hyunwoo Kim,et al.  Advanced probabilistic approach for network intrusion forecasting and detection , 2013, Expert Syst. Appl..

[8]  Ester Yen,et al.  Data mining-based intrusion detectors , 2009, Expert Syst. Appl..

[9]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[10]  Jaideep Srivastava,et al.  Data Mining for Network Intrusion Detection , 2002 .

[11]  Gisung Kim,et al.  A novel hybrid intrusion detection method integrating anomaly detection with misuse detection , 2014, Expert Syst. Appl..

[12]  Symeon Papavassiliou,et al.  Network intrusion and fault detection: a statistical anomaly approach , 2002, IEEE Commun. Mag..

[13]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[14]  Adel Sabry Eesa,et al.  A novel feature-selection approach based on the cuttlefish optimization algorithm for intrusion detection systems , 2015, Expert Syst. Appl..

[15]  Mohammad Reza Parsaei,et al.  E-mail spam detection based on part of speech tagging , 2015, 2015 2nd International Conference on Knowledge-Based Engineering and Innovation (KBEI).

[16]  Zhu Wang,et al.  A research using hybrid RBF/Elman neural networks for intrusion detection system secure model , 2009, Comput. Phys. Commun..

[17]  Somnuk Phon-Amnuaisuk,et al.  The Effectiveness of Sampling Methods for the Imbalanced Network Intrusion Detection Data Set , 2014, SCDM.

[18]  Zubair A. Baig,et al.  GMDH-based networks for intelligent intrusion detection , 2013, Eng. Appl. Artif. Intell..

[19]  Chou-Yuan Lee,et al.  An intelligent algorithm with feature selection and decision rules applied to anomaly intrusion detection , 2012, Appl. Soft Comput..

[20]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[21]  Gerald A. Marin Network Security Basics , 2005, IEEE Secur. Priv..