Improved Feature Selection Algorithm Based on SVM and Correlation

As a feature selection method, support vector machines-recursive feature elimination (SVM-RFE) can remove irrelevance features but don’t take redundant features into consideration. In this paper, it is shown why this method can’t remove redundant features and an improved technique is presented. Correlation coefficient is introduced to measure the redundancy in the selected subset with SVM-RFE. The features which have a great correlation coefficient with some important feature are removed. Experimental results show that there actually are several strongly redundant features in the selected subsets by SVM-RFE. The coefficients are high to 0.99. The proposed method can not only reduce the number of features, but also keep the classification accuracy.

[1]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[2]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[3]  Qinghua Hu,et al.  Information-preserving hybrid data reduction based on fuzzy-rough techniques , 2006, Pattern Recognit. Lett..

[4]  Wen-kai Lu,et al.  Feature expansion and feature selection for general pattern recognition problems , 2003, International Conference on Neural Networks and Signal Processing, 2003. Proceedings of the 2003.

[5]  Jie Yang,et al.  Feature Selection for Multi-class Problems Using Support Vector Machines , 2004, PRICAI.

[6]  Edward R. Dougherty,et al.  The coefficient of intrinsic dependence (feature selection using el CID) , 2005, Pattern Recognit..

[7]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[8]  Huan Liu,et al.  Active Feature Selection Using Classes , 2003, PAKDD.

[9]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[10]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[11]  N. Mati,et al.  Discovering Informative Patterns and Data Cleaning , 1996 .

[12]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[13]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[14]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[15]  F. Azuaje,et al.  Multiple SVM-RFE for gene selection in cancer classification with expression data , 2005, IEEE Transactions on NanoBioscience.

[16]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..