Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

Algorithms for feature selection fall into two broad categories: wrappers that use the learning algorithm itself to evaluate the usefulness of features and filters that evaluate features according to heuristics based on general characteristics of the data. For application to large databases, filters have proven to be more practical than wrappers because they are much faster. However, most existing filter algorithms only work with discrete classification problems. This paper describes a fast, correlation-based filter algorithm that can be applied to continuous and discrete problems. The algorithm often outperforms the well-known ReliefF attribute estimator when used as a preprocessing step for naive Bayes, instance-based learning, decision trees, locally weighted regression, and model trees. It performs more feature selection than ReliefF does—reducing the data dimensionality by fifty percent in most cases. Also, decision and model trees built from the preprocessed data are often significantly smaller.

[1]  Daniel N. Hill,et al.  An Empirical Investigation of Brute Force to choose Features, Smoothers and Function Approximators , 1992 .

[2]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[3]  D. Kilpatrick,et al.  Numeric Prediction Using Instance-Based Learning with Encoding Length Selection , 1997, ICONIP.

[4]  Ian H. Witten,et al.  Induction of model trees for predicting continuous classes , 1996 .

[5]  Igor Kononenko,et al.  On Biases in Estimating Multi-Valued Attributes , 1995, IJCAI.

[6]  Keki B. Irani,et al.  Multi-interval discretization of continuos attributes as pre-processing for classi cation learning , 1993, IJCAI 1993.

[7]  E. Ghiselli Theory of psychological measurement , 1964 .

[8]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[9]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[10]  Marko Robnik-Sikonja,et al.  An adaptation of Relief for attribute estimation in regression , 1997, ICML.

[11]  Geoffrey Holmes,et al.  Feature selection via the discovery of simple classification rules , 1995 .

[12]  J. Simonoff Smoothing Methods in Statistics , 1998 .

[13]  Russell Greiner,et al.  Computational learning theory and natural learning systems , 1997 .

[14]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[15]  Leonard E. Trigg,et al.  Naive Bayes for regression , 1998 .

[16]  Andrew W. Moore,et al.  Efficient Algorithms for Minimizing Cross Validation Error , 1994, ICML.

[17]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[18]  Ron Kohavi,et al.  Wrappers for performance enhancement and oblivious decision graphs , 1995 .

[19]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[20]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[21]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[22]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[23]  William H. Press,et al.  Numerical recipes in C , 2002 .