Machine-Learning-Based Feature Selection Techniques for Large-Scale Network Intrusion Detection

Nowadays, we see more and more cyber-attacks on major Internet sites and enterprise networks. Intrusion Detection System (IDS) is a critical component of such infrastructure defense mechanism. IDS monitors and analyzes networks' activities for potential intrusions and security attacks. Machine-learning (ML) models have been well accepted for signature-based IDSs due to their learn ability and flexibility. However, the performance of existing IDSs does not seem to be satisfactory due to the rapid evolution of sophisticated cyber threats in recent decades. Moreover, the volumes of data to be analyzed are beyond the ability of commonly used computer software and hardware tools. They are not only large in scale but fast in/out in terms of velocity. In big data IDS, the one must find an efficient way to reduce the size of data dimensions and volumes. In this paper, we propose novel feature selection methods, namely, RF-FSR (Random Forest-Forward Selection Ranking) and RF-BER (Random Forest-Backward Elimination Ranking). The features selected by the proposed methods were tested and compared with three of the most well-known feature sets in the IDS literature. The experimental results showed that the selected features by the proposed methods effectively improved their detection rate and false-positive rate, achieving 99.8% and 0.001% on well-known KDD-99 dataset, respectively.

[1]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[2]  Emin Anarim,et al.  An intelligent intrusion detection system (IDS) for anomaly and misuse detection in computer networks , 2005, Expert Syst. Appl..

[3]  A. Nur Zincir-Heywood,et al.  Analysis of Three Intrusion Detection System Benchmark Datasets Using Machine Learning Algorithms , 2005, ISI.

[4]  Malcolm I. Heywood,et al.  Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 , 2005, PST.

[5]  Mahmood Fathy,et al.  Comparison of Two Feature Selection Methods in Intrusion Detection Systems , 2007, 7th IEEE International Conference on Computer and Information Technology (CIT 2007).

[6]  Theodoros Lappas,et al.  Data Mining Techniques for ( Network ) Intrusion Detection Systems , 2007 .

[7]  Mohammad Zulkernine,et al.  Random-Forests-Based Network Intrusion Detection Systems , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[8]  Fakhri Karray,et al.  Features Selection for Intrusion Detection Systems Based on Support Vector Machines , 2009, 2009 6th IEEE Consumer Communications and Networking Conference.

[9]  Wei-Yang Lin,et al.  Intrusion detection by machine learning: A review , 2009, Expert Syst. Appl..

[10]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[11]  Vegard Engen Machine learning for network based intrusion detection : an investigation into discrepancies in findings with the KDD cup '99 data set and multi-objective evolution of neural network classifier ensembles from imbalanced data , 2010 .

[12]  Bharat K. Bhargava,et al.  Identifying important characteristics in the KDD99 intrusion detection dataset by feature selection using a hybrid approach , 2010, 2010 17th International Conference on Telecommunications.

[13]  Yu-Xin Meng,et al.  The practice on using machine learning for network anomaly intrusion detection , 2011, 2011 International Conference on Machine Learning and Cybernetics.

[14]  P. Amudha,et al.  Performance Analysis of Data Mining Approaches in Intrusion Detection , 2011, 2011 International Conference on Process Automation, Control and Computing.

[15]  Verónica Bolón-Canedo,et al.  Feature selection and classification in multiple class datasets: An application to KDD Cup 99 dataset , 2011, Expert Syst. Appl..

[16]  Hany M. Harb,et al.  Selecting Optimal Subset of Features for Intrusion Detection Systems , 2011 .

[17]  Seref Sagiroglu,et al.  Big data: A review , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).

[18]  Helmut Krcmar,et al.  Big Data , 2014, Wirtschaftsinf..

[19]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[20]  Jugal K. Kalita,et al.  Network Anomaly Detection: Methods, Systems and Tools , 2014, IEEE Communications Surveys & Tutorials.

[21]  Ck Cheng,et al.  The Age of Big Data , 2015 .