Fast Feature Reduction in intrusion detection datasets

In the most intrusion detection systems (IDS), a system tries to learn characteristics of different type of attacks by analyzing packets that sent or received in network. These packets have a lot of features. But not all of them is required to be analyzed to detect that specific type of attack. Detection speed and computational cost is another vital matter here, because in these types of problems, datasets are very huge regularly. In this paper we tried to propose a very simple and fast feature selection method to eliminate features with no helpful information on them. Result faster learning in process of redundant feature omission. We compared our proposed method with three most successful similarity based feature selection algorithm including Correlation Coefficient, Least Square Regression Error and Maximal Information Compression Index. After that we used recommended features by each of these algorithms in two popular classifiers including: Bayes and KNN classifier to measure the quality of the recommendations. Experimental result shows that although the proposed method can't outperform evaluated algorithms with high differences in accuracy, but in computational cost it has huge superiority over them.

[1]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[2]  Ajith Abraham,et al.  Feature deduction and ensemble design of intrusion detection systems , 2005, Comput. Secur..

[3]  Tingxian Zhou,et al.  A novel approach to intrusion detection based on support vector data description , 2004, 30th Annual Conference of IEEE Industrial Electronics Society, 2004. IECON 2004.

[4]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[5]  Daniel T. Larose,et al.  k‐Nearest Neighbor Algorithm , 2005 .

[6]  Risto Miikkulainen,et al.  Intrusion Detection with Neural Networks , 1997, NIPS.

[7]  Luo Min Anomaly Intrusion Detection Method Based on SVDD , 2005 .

[8]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[9]  Gu Hongying DoS Intrusion Detection Based on Incremental Learning with Support Vector Machines , 2006 .

[10]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[11]  Jianming Fu,et al.  A Framework for Adaptive Anomaly Detection Based on Support Vector Data Description , 2004, NPC.

[12]  B. Achiriloaie,et al.  VI REFERENCES , 1961 .

[13]  Malcolm I. Heywood,et al.  Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 , 2005, PST.

[14]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[15]  R Repges,et al.  A comparison of similarity measures for digital subtraction radiography. , 1997, Computers in biology and medicine.

[16]  Matt Bishop,et al.  The Art and Science of Computer Security , 2002 .

[17]  Lakhmi C. Jain,et al.  Introduction to Bayesian Networks , 2008 .

[18]  J. Durbin,et al.  Testing for serial correlation in least squares regression. II. , 1950, Biometrika.

[19]  Symeon Papavassiliou,et al.  Network intrusion and fault detection: a statistical anomaly approach , 2002, IEEE Commun. Mag..

[20]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[21]  Andrew H. Sung,et al.  Intrusion detection using neural networks and support vector machines , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[22]  Wang Jing,et al.  Intrusion Detection Technology Based on SVDD , 2009, 2009 Second International Conference on Intelligent Networks and Intelligent Systems.

[23]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[24]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  J. Durbin,et al.  Testing for serial correlation in least squares regression. I. , 1950, Biometrika.