Using Rough Sets with Heuristics for Feature Selection

Practical machine learning algorithms are known to degrade in performance (prediction accuracy) when faced with many features (sometimes attribute is used instead of feature) that are not necessary for rule discovery. To cope with this problem, many methods for selecting a subset of features have been proposed. Among such methods, the filter approach that selects a feature subset using a preprocessing step, and the wrapper approach that selects an optimal feature subset from the space of possible subsets of features using the induction algorithm itself as a part of the evaluation function, are two typical ones. Although the filter approach is a faster one, it has some blindness and the performance of induction is not considered. On the other hand, the optimal feature subsets can be obtained by using the wrapper approach, but it is not easy to use because of the complexity of time and space. In this paper, we propose an algorithm which is using rough set theory with greedy heuristics for feature selection. Selecting features is similar to the filter approach, but the evaluation criterion is related to the performance of induction. That is, we select the features that do not damage the performance of induction.

[1]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[2]  Ivan Bratko,et al.  Feature Transformation by Function Decomposition , 1998, IEEE Intell. Syst..

[3]  Andrzej Skowron,et al.  New Directions in Rough Sets, Data Mining, and Granular-Soft Computing , 1999, Lecture Notes in Computer Science.

[4]  Ron Kohavi Feature Subset Selection as Search with Probabilistic Estimates , 1994 .

[5]  Moussa Boussouf,et al.  A Hybrid Approach to Feature Selection , 1998, PKDD.

[6]  Hiroshi Motoda,et al.  Feature Selection Aspects , 1998 .

[7]  N. Zhong,et al.  Data Mining: A Probabilistic Rough Set Approach , 1998 .

[8]  Andrzej Skowron,et al.  Synthesis of Decision Systems from Data Tables , 1997 .

[9]  Ning Zhong,et al.  Data Mining Based on the Generalization Distribution Table and Rough Sets , 1998, PAKDD.

[10]  Yiyu Yao,et al.  On Information-Theoretic Measures of Attribute Importance , 1999, PAKDD.

[11]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[12]  Ron Kohavi,et al.  Useful Feature Subsets and Rough Set Reducts , 1994 .

[13]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection , 1998 .

[14]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[15]  Ron Kohavi,et al.  Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology , 1995, KDD.

[16]  Andrzej Skowron,et al.  The Discernibility Matrices and Functions in Information Systems , 1992, Intelligent Decision Support.

[17]  Ning Zhong,et al.  Methodologies for Knowledge Discovery and Data Mining , 2002, Lecture Notes in Computer Science.

[18]  Alfred V. Aho,et al.  Data Structures and Algorithms , 1983 .

[19]  Lech Polkowski,et al.  Rough Sets in Knowledge Discovery 2 , 1998 .

[20]  Ning Zhong,et al.  Probabilistic Rough Induction: The GDT-RS Methodology and Algorithms , 1999, ISMIS.