论文信息 - Feature Subset Selection: A Correlation Based Filter Approach

Feature Subset Selection: A Correlation Based Filter Approach

Recent work has shown that feature subset selection can have a positive affect on the performance of machine learning algorithms. Some algorithms can be slowed or their performance irrelevant or redundant to the learning task. Feature subset selection, then, is a method for enhancing the performance of learning algorithms, reducing the hypothesis search space, and, in some cases, reducing the storage requirement. This paper describes a feature subset selector that uses a correlation based evaluates its effectiveness with three common ML algorithms: a decision tree inducer (C4.5), a naive Bayes classifier, and an instance based learner (IB1). Experiments using a number of standard data sets drawn from real and artificial domains are presented. Feature subset selection gave significant improvement for all three algorithms; C4.5 generated smaller decision trees.

[1] E. Ghiselli. Theory of psychological measurement , 1964 .

[2] Ron Kohavi,et al. Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[3] Larry A. Rendell,et al. A Practical Approach to Feature Selection , 1992, ML.

[4] Thomas G. Dietterich,et al. Learning with Many Irrelevant Features , 1991, AAAI.

[5] Pat Langley,et al. Scaling to domains with irrelevant features , 1997, COLT 1997.

[6] Geoffrey Holmes,et al. Feature selection via the discovery of simple classification rules , 1995 .