Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information

A novel feature selection method using the concept of mutual information (MI) is proposed in this paper. In all MI based feature selection methods, effective and efficient estimation of high-dimensional MI is crucial. In this paper, a pruned Parzen window estimator and the quadratic mutual information (QMI) are combined to address this problem. The results show that the proposed approach can estimate the MI in an effective and efficient way. With this contribution, a novel feature selection method is developed to identify the salient features one by one. Also, the appropriate feature subsets for classification can be reliably estimated. The proposed methodology is thoroughly tested in four different classification applications in which the number of features ranged from less than 10 to over 15000. The presented results are very promising and corroborate the contribution of the proposed feature selection methodology.

[1]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[2]  Mohamed A. Deriche,et al.  An optimal feature selection technique using the concept of mutual information , 2001, Proceedings of the Sixth International Symposium on Signal Processing and its Applications (Cat.No.01EX467).

[3]  J. Moody,et al.  Feature Selection Based on Joint Mutual Information , 1999 .

[4]  J. Utans,et al.  Selecting neural network architectures via the prediction risk: application to corporate bond rating prediction , 1991, Proceedings First International Conference on Artificial Intelligence Applications on Wall Street.

[5]  Huan Liu,et al.  Neural-network feature selector , 1997, IEEE Trans. Neural Networks.

[6]  Wentian Li Mutual information functions versus correlation functions , 1990 .

[7]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[8]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[9]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[10]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[11]  W. Wang,et al.  A Comparative Study of Feature-Salience Ranking Techniques , 2001, Neural Computation.

[12]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[13]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[14]  Huan Liu,et al.  A Monotonic Measure for Optimal Feature Selection , 1998, ECML.

[15]  William M. Campbell,et al.  Mutual Information in Learning Feature Transformations , 2000, ICML.

[16]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[17]  R Kahavi,et al.  Wrapper for feature subset selection , 1997 .

[18]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[19]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[20]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[21]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[22]  Moon,et al.  Estimation of mutual information using kernel density estimators. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[23]  Fraser,et al.  Independent coordinates for strange attractors from mutual information. , 1986, Physical review. A, General physics.

[24]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[26]  A. S. Weigend,et al.  Selecting Input Variables Using Mutual Information and Nonparemetric Density Estimation , 1994 .