An exact feature selection algorithm based on rough set theory

Feature reduction based on rough set theory is an effective feature selection method in pattern recognition applications. Finding a minimal subset of the original features is inherent in rough set approach to feature selection. As feature reduction is a Nondeterministic Polynomial-time-hard problem, it is necessary to develop fast optimal or near-optimal feature selection algorithms. This article aims to propose an exact feature selection algorithm in rough set that is efficient in terms of computation time. The proposed algorithm begins the examination of a solution tree by a breadth-first strategy. The pruned nodes are held in a version of the trie data structure. Based on the monotonic property of dependency degree, all subsets of the pruned nodes cannot be optimal solutions. Thus, by detecting these subsets in trie, it is not necessary to calculate their dependency degree. The search on the tree continues until the optimal solution is found. This algorithm is improved by selecting an initial search level determined by the hill-climbing method instead of searching the tree from the level below the root. The length of the minimal reduct and the size of data set can influence which starting search level is more efficient. The experimental results using some of the standard UCI data sets, demonstrate that the proposed algorithm is effective and efficient for data sets with more than 30 features. © 2014 Wiley Periodicals, Inc. Complexity 20: 50-62, 2015

[1]  Yumin Chen,et al.  A rough set approach to feature selection based on power set tree , 2011, Knowl. Based Syst..

[2]  Heikki Mannila,et al.  On the Complexity of Inferring Functional Dependencies , 1992, Discret. Appl. Math..

[3]  Qiang Shen,et al.  Computational Intelligence and Feature Selection - Rough and Fuzzy Approaches , 2008, IEEE Press series on computational intelligence.

[4]  Masao Fukushima,et al.  Tabu search for attribute reduction in rough set theory , 2008, Soft Comput..

[5]  Eduardo Gasca,et al.  Eliminating redundancy and irrelevance using a new MLP-based feature selection method , 2006, Pattern Recognit..

[6]  Robert Susmaga,et al.  Reducts and Constructs in Attribute Reduction , 2004, Fundam. Informaticae.

[7]  Xiangyang Wang,et al.  Feature selection based on rough sets and particle swarm optimization , 2007, Pattern Recognit. Lett..

[8]  Lajos Rónyai,et al.  Trie: An alternative data structure for data mining algorithms , 2003 .

[9]  C. Raghavendra Rao,et al.  Reduct Generation in Information Systems , 2007, Eng. Lett..

[10]  Qiang Shen,et al.  Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches , 2004, IEEE Transactions on Knowledge and Data Engineering.

[11]  Andrzej Skowron,et al.  Rough Sets: A Tutorial , 1998 .

[12]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[13]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[14]  Si-Yuan Jing,et al.  A hybrid genetic algorithm for feature subset selection in rough set theory , 2014, Soft Comput..

[15]  Duoqian Miao,et al.  A rough set approach to feature selection based on ant colony optimization , 2010, Pattern Recognit. Lett..

[16]  Mineichi Kudo,et al.  Comparison of algorithms that select features for pattern classifiers , 2000, Pattern Recognit..

[17]  Douglas M. Kline Two-group classification using the Bayesian data reduction algorithm , 2010, Complex..

[18]  Li Pheng Khoo,et al.  Feature extraction using rough set theory and genetic algorithms--an application for the simplification of product quality evaluation , 2002 .

[19]  Qiang Shen,et al.  Finding Rough Set Reducts with Ant Colony Optimization , 2003 .

[20]  Andrzej Skowron,et al.  Rough sets: Some extensions , 2007, Inf. Sci..

[21]  Andrzej Skowron,et al.  Rough-Fuzzy Hybridization: A New Trend in Decision Making , 1999 .

[22]  Jan G. Bazan,et al.  Rough set algorithms in classification problem , 2000 .

[23]  Krzysztof Krawiec,et al.  ROUGH SET REDUCTION OF ATTRIBUTES AND THEIR DOMAINS FOR NEURAL NETWORKS , 1995, Comput. Intell..

[24]  Masahiro Inuiguchi,et al.  Variable-precision dominance-based rough set approach and attribute reduction , 2009, Int. J. Approx. Reason..

[25]  Da Ruan,et al.  A parallel method for computing rough set approximations , 2012, Inf. Sci..

[26]  Ning Zhong,et al.  Using Rough Sets with Heuristics for Feature Selection , 1999, RSFDGrC.

[27]  Vijay V. Raghavan,et al.  Feature Selection and Effective Classifiers , 1998, J. Am. Soc. Inf. Sci..

[28]  Andrzej Skowron,et al.  Towards an approximation theory of discrete problems, Part I , 1991, Fundam. Informaticae.

[29]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[30]  Ahmed Al-Ani,et al.  Feature Subset Selection Using Ant Colony Optimization , 2008 .

[31]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[32]  Edward Fredkin,et al.  Trie memory , 1960, Commun. ACM.