论文信息 - Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper

Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper

Feature selection is often an essential data processing step prior to applying a learning algorithm. The removal of irrelevant and redundant information often improves the performance of machine learning algorithms. There are two common approaches: a wrapper uses the intended learning algorithm itself to evaluate the usefulness of features, while a fllter evaluates features according to heuristics based on general characteristics of the data. The wrapper approach is generally considered to produce better feature subsets but runs much more slowly than a fllter. This paper describes a new fllter approach to feature selection that uses a correlation based heuristic to evaluate the worth of feature subsets When applied as a data preprocessing step for two common machine learning algorithms, the new method compares favourably with the wrapper but requires much less computation.

Lloyd A. Smith | Mark A. Hall | L. A. Smith | M. Hall

[1] Nils J. Nilsson,et al. Artificial Intelligence , 1974, IFIP Congress.

[2] E. Ghiselli. Theory of psychological measurement , 1964 .

[3] Larry A. Rendell,et al. A Practical Approach to Feature Selection , 1992, ML.

[4] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[5] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[6] Geoffrey Holmes,et al. Feature selection via the discovery of simple classification rules , 1995 .

[7] Ron Kohavi,et al. MLC++: a machine learning library in C++ , 1994, Proceedings Sixth International Conference on Tools with Artificial Intelligence. TAI 94.

[8] Ron Kohavi,et al. Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[9] Daphne Koller,et al. Toward Optimal Feature Selection , 1996, ICML.

[10] Usama M. Fayyad,et al. Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[11] Tefko Saracevic,et al. Information science: What is it? , 1968 .

[12] R. Wallace. Is this a practical approach? , 2001, Journal of the American College of Surgeons.

[13] Thomas G. Dietterich,et al. Efficient Algorithms for Identifying Relevant Features , 1992 .

[14] Ron Kohavi,et al. Wrappers for performance enhancement and oblivious decision graphs , 1995 .

[15] Lloyd A. Smith,et al. Practical feature subset selection for machine learning , 1998 .