A Novel Algorithm Based on Conditional Entropy Established by Clustering for Feature Selection

Feature selection is an important issue in machine learning. Rough set theory is one of the important methods for feature selection. In rough set theory, feature selection has already been separately studied in algebra view and information view. Unfortunately, the previously proposed methods based on information entropy for feature selection only focus on the discrete datasets. However, how to effectively discretize the continuous datasets is also full of challenge, since this method may lead to loss of some useful information. To overcome this disadvantage, in this paper, we introduce a novel algorithm based on conditional entropy by clustering strategy for feature selection (ACECFS). In ACECFS, the projected data corresponding to each feature is appropriately separated into several clusters at first, and then the conditional entropy for a set of features is conveniently computed by the clusters and corresponding feature list is generated, hence an effectively relevant and compact feature subset can be obtained from the ranked feature list. Experiments show the effectiveness of ACECFS.

[1]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[2]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[3]  Z. Pawlak,et al.  Rough set approach to multi-attribute decision analysis , 1994 .

[4]  Ming Yang,et al.  A novel condensing tree structure for rough set feature selection , 2008, Neurocomputing.

[5]  Wang Guo,et al.  Decision Table Reduction based on Conditional Information Entropy , 2002 .

[6]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[7]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[8]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[9]  Andrzej Skowron,et al.  Rough set methods in feature selection and recognition , 2003, Pattern Recognit. Lett..

[10]  Yang Ming Approximate Reduction Based on Conditional Information Entropy in Decision Tables , 2007 .

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Ian Witten,et al.  Data Mining , 2000 .

[13]  Dimitrios Gunopulos,et al.  Locally Adaptive Metric Nearest-Neighbor Classification , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Ujjwal Maulik,et al.  Validity index for crisp and fuzzy clusters , 2004, Pattern Recognit..

[15]  Kalyan Moy Gupta,et al.  Rough Set Feature Selection Algorithms for Textual Case-Based Classification , 2006, ECCBR.

[16]  Yu Wu,et al.  Theoretical study on attribute reduction of rough set theory: comparison of algebra and information views , 2004 .

[17]  I. Jolliffe Principal Component Analysis , 2002 .

[18]  Qiang Shen,et al.  Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches , 2004, IEEE Transactions on Knowledge and Data Engineering.