An Effective Feature Selection Method via Mutual Information Estimation

This paper proposes a new feature selection method using a mutual information-based criterion that measures the importance of a feature in a backward selection framework. It considers the dependency among many features and uses either one of two well-known probability density function estimation methods when computing the criterion. The proposed approach is compared with existing mutual information-based methods and another sophisticated filter method on many artificial and real-world problems. The numerical results show that the proposed method can effectively identify the important features in data sets having dependency among many features and is superior, in almost all cases, to the benchmark methods.

[1]  Chao He,et al.  Probability Density Estimation from Optimally Condensed Data Samples , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[3]  Zexuan Zhu,et al.  Wrapper–Filter Feature Selection Algorithm Using a Memetic Framework , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[4]  Chun-Nan Hsu,et al.  The ANNIGMA-wrapper approach to fast feature selection for neural nets , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[5]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[6]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[7]  Le Song,et al.  Supervised feature selection via dependence estimation , 2007, ICML '07.

[8]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[9]  Jian-Bo Yang,et al.  Feature Selection for MLP Neural Network: The Use of Random Permutation of Probabilistic Outputs , 2009, IEEE Transactions on Neural Networks.

[10]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[11]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Chong Jin Ong,et al.  Feature selection via sensitivity analysis of SVM probabilistic outputs , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[14]  Nuno Vasconcelos,et al.  Natural Image Statistics and Low-Complexity Feature Selection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[16]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[17]  Kezhi Mao,et al.  Identifying critical variables of principal components for unsupervised feature selection , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[18]  Jian-Bo Yang,et al.  Feature Selection Using Probabilistic Prediction of Support Vector Regression , 2011, IEEE Transactions on Neural Networks.