A Novel Information Theory Method for Filter Feature Selection

In this paper, we propose a novel filter for feature selection. Such filter relies on the estimation of the mutual information between features and classes. We bypass the estimation of the probability density function with the aid of the entropic-graphs approximation of Renyi entropy, and the subsequent approximation of the Shannon one. The complexity of such bypassing process does not depend on the number of dimensions but on the number of patterns/samples, and thus the curse of dimensionality is circumvented. We show that it is then possible to outperform a greedy algorithm based on the maximal relevance and minimal redundancy criterion. We successfully test our method both in the contexts of image classification and microarray data classification.

[1]  Karol Zyczkowski,et al.  Rényi Extrapolation of Shannon Entropy , 2003, Open Syst. Inf. Dyn..

[2]  Claudio Gentile,et al.  Fast Feature Selection from Microarray Expression Data via Multiplicative Large Margin Algorithms , 2003, NIPS.

[3]  Paul A. Viola,et al.  Empirical Entropy Manipulation for Real-World Problems , 1995, NIPS.

[4]  D. Bertsimas,et al.  An asymptotic determination of the minimum spanning tree and minimum matching constants in geometrical probability , 1990 .

[5]  Juan Manuel Sáez,et al.  EBEM: An Entropy-based EM Algorithm for Gaussian Mixture Models , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[6]  David R. Wolf,et al.  Estimating functions of probability distributions from a finite set of samples. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[7]  Alfred O. Hero,et al.  Image registration in high-dimensional feature space , 2005, IS&T/SPIE Electronic Imaging.

[8]  James Theiler,et al.  Online Feature Selection using Grafting , 2003, ICML.

[9]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[10]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[11]  Alfred O. Hero,et al.  Image registration methods in high‐dimensional space , 2006, Int. J. Imaging Syst. Technol..

[12]  J. Stuart Aitken,et al.  Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes , 2005, BMC Bioinformatics.

[13]  Paul A. Viola,et al.  Alignment by Maximization of Mutual Information , 1997, International Journal of Computer Vision.

[14]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[15]  Alfred O. Hero,et al.  Asymptotic theory of greedy approximations to minimal k-point random graphs , 1999, IEEE Trans. Inf. Theory.

[16]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[17]  Abdelkader Mokkadem,et al.  Estimation of the entropy and information of absolutely continuous random variables , 1989, IEEE Trans. Inf. Theory.

[18]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[19]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[20]  Edward R. Dougherty,et al.  What should be expected from feature selection in small-sample settings , 2006, Bioinform..

[21]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Alfred O. Hero,et al.  Applications of entropic spanning graphs , 2002, IEEE Signal Process. Mag..