Linear Correlation-Based Feature Selection for Network Intrusion Detection Model

Feature selection is a preprocessing phase to machine learning, which leads to increase the classification accuracy and reduce its complexity. However, the increase of data dimensionality poses a challenge to many existing feature selection methods. This paper formulates and validates a method for selecting optimal feature subset based on the analysis of the Pearson correlation coefficients. We adopt the correlation analysis between two variables as a feature goodness measure. Where, a feature is good if it is highly correlated to the class and is low correlated to the other features. To evaluate the proposed Feature selection method, experiments are applied on NSL-KDD dataset. The experiments shows that, the number of features is reduced from 41 to 17 features, which leads to improve the classification accuracy to 99.1%. Also,The efficiency of the proposed linear correlation feature selection method is demonstrated through extensive comparisons with other well known feature selection methods.

[1]  Marc Dacier,et al.  Towards a taxonomy of intrusion-detection systems , 1999, Comput. Networks.

[2]  William H. Press,et al.  Numerical recipes in C , 2002 .

[3]  Yinhui Li,et al.  An efficient intrusion detection system based on support vector machines and gradually feature removal method , 2012, Expert Syst. Appl..

[4]  David G. Stork,et al.  Pattern Classification , 1973 .

[5]  Huan Liu,et al.  Feature selection for clustering - a filter solution , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[6]  Ahmed A. Elngar,et al.  A Real-Time Anomaly Network Intrusion Detection System with High Accuracy , 2013 .

[7]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[8]  V.V. Phoha,et al.  Dimension reduction using feature extraction methods for real-time misuse detection systems , 2004, Proceedings from the Fifth Annual IEEE SMC Information Assurance Workshop, 2004..

[9]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[10]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Sam Kwong,et al.  Genetic-fuzzy rule mining approach and evaluation of feature selection techniques for anomaly intrusion detection , 2007, Pattern Recognition.

[12]  Ah-Hwee Tan,et al.  Data Mining for Biomedical Applications , 2006, Lecture Notes in Computer Science.

[13]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[14]  L. N. Kanal,et al.  Handbook of Statistics, Vol. 2. Classification, Pattern Recognition and Reduction of Dimensionality. , 1985 .

[15]  Wei-Yang Lin,et al.  Intrusion detection by machine learning: A review , 2009, Expert Syst. Appl..

[16]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[17]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[18]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[19]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[20]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[21]  Mohamed Ben Ahmed,et al.  Intrusion detection based on “Hybrid” propagation in Bayesian Networks , 2009, 2009 IEEE International Conference on Intelligence and Security Informatics.

[22]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[23]  D. J. Hand,et al.  Artificial intelligence , 1981, Psychological Medicine.

[24]  Xin Jin,et al.  Machine Learning Techniques and Chi-Square Feature Selection for Cancer Classification Using SAGE Gene Expression Profiles , 2006, BioDM.

[25]  Nasser Yazdani,et al.  Mutual information-based feature selection for intrusion detection systems , 2011, J. Netw. Comput. Appl..

[26]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[27]  Filippo Menczer,et al.  Feature selection in unsupervised learning via evolutionary search , 2000, KDD '00.