A New Perspective for Information Theoretic Feature Selection

Feature Filters are among the simplest and fastest approaches to feature selection. A filter defines a statistical criterion, used to rank features on how useful they are expected to be for classification. The highest ranking features are retained, and the lowest ranking can be discarded. A common approach is to use the Mutual Information between the feature and class label. This area has seen a recent flurry of activity, resulting in a confusing variety of heuristic criteria all based on mutual information, and a lack of a principled way to understand or relate them. The contribution of this paper is a unifying theoretical understanding of such filters. In contrast to current methods which manually construct filter criteria with particular properties, we show how to naturally derive a space of possible ranking criteria. We will show that several recent contributions in the feature selection literature are points within this continuous space, and that there exist many points that have never been explored.

[1]  Shimon Ullman,et al.  Object recognition with informative features and linear classification , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[4]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[5]  E. Bender,et al.  On the Applications of Möbius Inversion in Combinatorial Analysis , 1975 .

[6]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[7]  G. Rota On the Foundations of Combinatorial Theory , 2009 .

[8]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[9]  Dahua Lin,et al.  Conditional Infomax Learning: An Integrated Framework for Feature Extraction and Fusion , 2006, ECCV.

[10]  G. Rota On the foundations of combinatorial theory I. Theory of Möbius Functions , 1964 .

[11]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  William J. McGill Multivariate information transmission , 1954, Trans. IRE Prof. Group Inf. Theory.

[13]  A. Thomasian Review of 'Transmission of Information, A Statistical Theory of Communications' (Fano, R. M.; 1961) , 1962 .

[14]  John E. Moody,et al.  Data Visualization and Feature Selection: New Algorithms for Nongaussian Data , 1999, NIPS.

[15]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..