Consistency Based Feature Selection

Feature selection is an effective technique in dealing with dimensionality reduction for classification task, a main component of data mining. It searches for an "optimal" subset of features. The search strategies under consideration are one of the three: complete, heuristic, and probabilistic. Existing algorithms adopt various measures to evaluate the goodness of feature subsets. This work focuses on one measure called consistency. We study its properties in comparison with other major measures and different ways of using this measure in search of feature subsets. We conduct an empirical study to examine the pros and cons of these different search methods using consistency. Through this extensive exercise, we aim to provide a comprehensive view of this measure and its relations with other measures and a guideline of the use of this measure with different search strategies facing a new application.

[1]  Moshe Ben-Bassat,et al.  35 Use of distance measures, information measures and error bounds in feature evaluation , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[2]  Huan Liu,et al.  Feature Selection and Classification - A Probabilistic Wrapper Approach , 1996, IEA/AIE.

[3]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[4]  M. Dash,et al.  Feature selection via set cover , 1997, Proceedings 1997 IEEE Knowledge and Data Engineering Exchange Workshop.

[5]  Satosi Watanabe,et al.  Pattern Recognition: Human and Mechanical , 1985 .

[6]  Thomas G. Dietterich,et al.  Learning Boolean Concepts in the Presence of Many Irrelevant Features , 1994, Artif. Intell..

[7]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[8]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[9]  Huan Liu,et al.  A Monotonic Measure for Optimal Feature Selection , 1998, ECML.

[10]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[11]  Jack Sklansky,et al.  On Automatic Feature Selection , 1988, Int. J. Pattern Recognit. Artif. Intell..

[12]  Ron Kohavi,et al.  Wrappers for performance enhancement and oblivious decision graphs , 1995 .

[13]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[14]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[15]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[16]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[17]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.