Correlation based feature selection method

Feature selection is an important data preprocessing step which is performed before a learning algorithm is applied. The issue that has to be taken into consideration when proposing a feature selection method is its computational complexity. Often, if the feature selection process is fast, it cannot thoroughly search the feature subset space and classification accuracy is degraded. Lately, a pairwise feature selection method was proposed as an effective trade-off between computation speed and classification accuracy. In this paper, a new feature selection method is proposed which further improves feature selection speed while preserving classification accuracy. The new method selects features individually or in a pairwise manner based on the correlations between features. Experiments conducted on several benchmark data sets prove with high statistical significance that the correlation-based feature selection method shortens computations compared to the pairwise feature selection method and produces classification errors that are not worse than those produced by existing methods.

[1]  Robert P. W. Duin,et al.  Pairwise feature evaluation for constructing reduced representations , 2007, Pattern Analysis and Applications.

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Jan M. Van Campenhout,et al.  On the Possible Orderings in the Measurement Selection Problem , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[4]  Halina Kwasnicka,et al.  Genetic Algorithm as an Attributes Selection Tool for Learning Algorithms , 2004, Intelligent Information Systems.

[5]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[6]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[7]  Sanmay Das,et al.  Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection , 2001, ICML.

[8]  Krzysztof Michalak,et al.  Correlation-based Feature Selection Strategy in Neural Classification , 2006, Sixth International Conference on Intelligent Systems Design and Applications.

[9]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[10]  Krzysztof Michalak,et al.  CORRELATION-BASED FEATURE SELECTION STRATEGY IN CLASSIFICATION PROBLEMS , 2006 .

[11]  Robert P. W. Duin,et al.  Pairwise Selection of Features and Prototypes , 2005, CORES.

[12]  Michel Verleysen,et al.  Learning high-dimensional data , 2001 .

[13]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[14]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[15]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[16]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[17]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[18]  Chi Hau Chen,et al.  Pattern recognition and signal processing , 1978 .

[19]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[20]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[21]  Sergey Ablameyko,et al.  Limitations and Future Trends in Neural Computation , 2003 .