A Powerful Feature Selection approach based on Mutual Information

Summary Feature selection aims to reduce the dimensionality of patterns for classificatory analysis by selecting the most informative instead of irrelevant and/or redundant features. In this paper we propose a novel feature selection measure based on mutual information and takes into consideration the interaction between features. The proposed measure is used to determine relevant features from the original feature set for a pattern recognition problem. We use a Support Vector Machine (SVM) classifier to compare the performance of our measure with recently proposed information theoretic criteria. Very good performances are obtained when applying this method on handwritten digital recognition data.

[1]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[2]  Raymond W. Yeung,et al.  A new outlook of Shannon's information measures , 1991, IEEE Trans. Inf. Theory.

[3]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[4]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[5]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[6]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[7]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[8]  Aleks Jakulin Attribute interactions in machine learning : master's thesis , 2002 .

[9]  Wlodzislaw Duch,et al.  Feature Selection and Ranking Filters , 2003 .

[10]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[11]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[12]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[13]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[14]  Ivan Bratko,et al.  Analyzing Attribute Dependencies , 2003, PKDD.

[15]  Thomas G. Dietterich,et al.  Learning Boolean Concepts in the Presence of Many Irrelevant Features , 1994, Artif. Intell..

[16]  William J. McGill Multivariate information transmission , 1954, Trans. IRE Prof. Group Inf. Theory.

[17]  Aleks Jakulin,et al.  Attribute Interactions in Machine Learning , 2003 .

[18]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[19]  Andrew W. Moore,et al.  Efficient Algorithms for Minimizing Cross Validation Error , 1994, ICML.

[20]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[21]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[22]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.