Unsupervised Feature Selection Method for Intrusion Detection System

This paper considers the feature selection problem for data classification in the absence of data labels. It first proposes an unsupervised feature selection algorithm, which is an enhancement over the Laplacian score method, named an Extended Laplacian score, EL in short. Specifically, two main phases are involved in EL to complete the selection procedures. In the first phase, the Laplacian score algorithm is applied to select the features that have the best locality preserving power. In the second phase, EL proposes a Redundancy Penalization (RP) technique based on mutual information to eliminate the redundancy among the selected features. This technique is an enhancement over Battiti's MIFS. It does not require a user-defined parameter such as beta to complete the selection processes of the candidate feature set as it is required in MIFS. After tackling the feature selection problem, the final selected subset is then used to build an Intrusion Detection System. The effectiveness and the feasibility of the proposed detection system are evaluated using three well-known intrusion detection datasets: KDD Cup 99, NSL-KDD and Kyoto 2006+ dataset. The evaluation results confirm that our feature selection approach performs better than the Laplacian score method in terms of classification accuracy.

[1]  Nizar Bouguila,et al.  Unsupervised Anomaly Intrusion Detection via Localized Bayesian Feature Selection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[2]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[3]  Nasser Yazdani,et al.  Mutual information-based feature selection for intrusion detection systems , 2011, J. Netw. Comput. Appl..

[4]  Salvatore J. Stolfo,et al.  Cost-based modeling for fraud and intrusion detection: results from the JAM project , 2000, Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00.

[5]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[6]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[7]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[8]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[9]  Hongnian Yu,et al.  Mutual information based input feature selection for classification problems , 2012, Decis. Support Syst..

[10]  Di Zhang,et al.  Global plus local: A complete framework for feature extraction and recognition , 2014, Pattern Recognit..

[11]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[12]  Panos Louvieris,et al.  Effects-based feature identification for network intrusion detection , 2013, Neurocomputing.

[13]  Wei-Yang Lin,et al.  Intrusion detection by machine learning: A review , 2009, Expert Syst. Appl..

[14]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[15]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[16]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[17]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[18]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[19]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[20]  Hiroki Takakura,et al.  Statistical analysis of honeypot data and building of Kyoto 2006+ dataset for NIDS evaluation , 2011, BADGERS '11.

[21]  Xiangjian He,et al.  A Novel Feature Selection Approach for Intrusion Detection Data Classification , 2014, 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications.

[22]  Yuxiao Hu,et al.  Face recognition using Laplacianfaces , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Simon C. K. Shiu,et al.  Unsupervised feature selection by regularized self-representation , 2015, Pattern Recognit..

[24]  Daoqiang Zhang,et al.  Constraint Score: A new filter method for feature selection with pairwise constraints , 2008, Pattern Recognit..

[25]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..