Intrusion feature selection using Modified Heuristic Greedy Algorithm of Itemset

This paper proposes the Modified Heuristic Greedy Algorithm of Itemset (MHGIS) as a feature selection method for Network Intrusion Data. The proposed method can be use as an alternative method to gain the proper attributes for the proposed domain data: Network Intrusion Data. MHGIS is modified from original Heuristic Greedy Algorithm of Itemset (HGIS) to increase efficiency for finding proper feature. In our work, we compare our result with the common method of feature selection is which the Chi-Square (Chi2) feature selection. There are 4 main steps in our experiment: Firstly, we start with data pre-processing to discard unnecessary attributes. Secondly, MHGIS feature selection and Chi2 feature selection have been employed on the pre-processed data, to reduce the number of attributes. Thirdly, we measure the recognition performance by using supervised learning algorithms which are C4.5, BPNN, RBF and SVM. Lastly, we evaluate the results received from MHGIS and Chi2. From the KDDCup99 dataset, we got 13,499 randomly sampling patterns with 34 data dimensions. With the use of MHGIS and Chi2 algorithms, we obtain 14 and 26 features respectively. The result shows that, the classification accuracies measure by C4.5 over the MHGIS selection algorithm produces better accuracies as compare to the Chi2 feature selection and HGIS feature selection over all types of classification methods.

[1]  Amir-Massoud Bidgoli,et al.  A Hybrid Feature Selection by Resampling, Chi squared and Consistency Evaluation Techniques , 2012 .

[2]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[3]  S. Sitharama Iyengar,et al.  Effective Discretization and Hybrid feature selection using Naïve Bayesian classifier for Medical datamining , 2009 .

[4]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[5]  M. Revathi,et al.  NETWORK INTRUSION DETECTION SYSTEM USING REDUCED DIMENSIONALITY , 2011 .

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[8]  Shang-Liang Chen,et al.  Orthogonal least squares learning algorithm for radial basis function networks , 1991, IEEE Trans. Neural Networks.

[9]  R. Hecht-Nielsen,et al.  Theory of the Back Propagation Neural Network , 1989 .

[10]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[11]  S. Appavu alias Balamurugan,et al.  Insight into Data Preprocessing: Theory and Practice: Data Mining Perspective , 2012 .

[12]  M. Cevdet Ince,et al.  A new feature selection method based on association rules for diagnosis of erythemato-squamous diseases , 2009, Expert Syst. Appl..

[13]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[14]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[15]  Hari Om,et al.  A hybrid system for reducing the false alarm rate of anomaly intrusion detection system , 2012, 2012 1st International Conference on Recent Advances in Information Technology (RAIT).