Enhancing rough set theory attributes selection of KDD Cup 1999

Attribute selection (Feature Selection) is a significant technique for data preprocessing and dimensionality reduction. Rough set has been used for attribute selection with great success. The optimal solution of rough set attribute selection is a subset of attributes called a reduct. Rough set uses approximation during reduction process to handle information inconsistency. However, some rough set approaches to attribute selection are inadequate at finding optimal reductions as no perfect heuristic can ensure optimality. Applying rough set for selecting the optimal subset of KDD Cup 1999 does not guarantee finding the optimal reduct of each class of this dataset due to the overlap between the lower and upper approximation of each class and the overlap between the reducts of all classes. This paper introduces a new approach to enhance the reduct of all classes by overcoming the overlap problem of rough set through adding union and voting attributes of all dataset classes as new reducts in addition to the normal reduct. The all reducts were evaluated by using different classification algorithms. The approach led to generate two generic attributes sets that achieved high and comparable accuracy rates as the normal attributes of rough set for the same dataset.

[1]  Adetunmbi A. Olusola,et al.  Analysis of KDD '99 Intrusion Detection Dataset for Selection of Relevance Features , 2010 .

[2]  Zdzisław Pawlak,et al.  Rough set theory and its applications , 2002, Journal of Telecommunications and Information Technology.

[3]  Xiangyang Wang,et al.  Feature selection based on rough sets and particle swarm optimization , 2007, Pattern Recognit. Lett..

[4]  Javad Rahimipour Anaraki,et al.  Rough set based feature selection: A Review , 2013, The 5th Conference on Information and Knowledge Technology.

[5]  Amrita,et al.  Performance Analysis Of Different Feature Selection Methods In Intrusion Detection , 2013 .

[6]  Lin Sun,et al.  Rough Entropy-based Feature Selection and Its Application ⋆ , 2011 .

[7]  Majdi M. Mafarja,et al.  Ant colony optimization based feature selection in rough set theory , 2013 .

[8]  Tina R. Patil,et al.  Performance Analysis of Naive Bayes and J 48 Classification Algorithm for Data Classification , 2013 .

[9]  I. Maqsood,et al.  Random Forests and Decision Trees , 2012 .

[10]  Suhaila Zainudin,et al.  Water cycle algorithm for attribute reduction problems in rough set theory , 2014 .

[11]  Salwani Abdullah,et al.  Record-to-Record Travel algorithm for attribute reduction in rough set theory , 2013 .

[12]  Shouhong Wang,et al.  Rough Set Analysis for Total Information Quality Management , 2009 .

[13]  Deeman Y. Mahmood,et al.  Intrusion Detection System Based on K-Star Classifier and Feature Set Reduction , 2013 .

[14]  Rung Ching Chen,et al.  Using Rough Set and Support Vector Machine for Network Intrusion Detection System , 2009, 2009 First Asian Conference on Intelligent Information and Database Systems.

[15]  Germano Lambert-Torres,et al.  Rough Set Theory - Fundamental Concepts, Principals, Data Extraction, and Applications , 2009 .

[16]  Mohd Aizaini Maarof,et al.  Feature Selection Using Rough Set in Intrusion Detection , 2006, TENCON 2006 - 2006 IEEE Region 10 Conference.

[17]  Ajith Abraham,et al.  Feature deduction and ensemble design of intrusion detection systems , 2005, Comput. Secur..

[18]  T. Marwala,et al.  Using Optimisation Techniques to Granulise Rough Set Partitions , 2007 .

[19]  Qi Zhang,et al.  A rough set approach to feature selection based on scatter search metaheuristic , 2014, J. Syst. Sci. Complex..

[20]  Perica Strbac,et al.  Toward optimal feature selection using ranking methods and classification algorithms , 2011 .

[21]  Masao Fukushima,et al.  Tabu search for attribute reduction in rough set theory , 2008, Soft Comput..

[22]  Amparo Alonso-Betanzos,et al.  Combining Feature Selection and Local Modelling in the KDD Cup 99 Dataset , 2009, ICANN.

[23]  Matthew V. Mahoney,et al.  Network traffic anomaly detection based on packet bytes , 2003, SAC '03.