A Mutual Information estimator for continuous and discrete variables applied to Feature Selection and Classification problems

AbstractCurrently Mutual Information has been widely used in pattern recognition and feature selection problems. It may be used as a measure of redundancy between features as well as a measure of dependency evaluating the relevance of each feature. Since marginal densities of real datasets are not usually known in advance, mutual information should be evaluated by estimation. There are mutual information estimators in the literature that were specifically designed for continuous or for discrete variables, however, most real problems are composed by a mixture of both. There is, of course, some implicit loss of information when using one of them to deal with mixed continuous and discrete variables. This paper presents a new estimator that is able to deal with mixed set of variables. It is shown in experiments with synthetic and real datasets that the method yields reliable results in such circumstance.

[1]  Michel Verleysen,et al.  Feature clustering and mutual information for the selection of variables in spectral data , 2007, ESANN.

[2]  Filiberto Pla,et al.  Clustering-Based Feature Selection in Semi-supervised Problems , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[3]  Carsten O. Daub,et al.  The mutual information: Detecting and evaluating dependencies between variables , 2002, ECCB.

[4]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[5]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[6]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Michel Verleysen,et al.  The permutation test for feature selection by mutual information , 2006, ESANN.

[8]  Michel Verleysen,et al.  Feature Selection with Mutual Information for Uncertain Data , 2011, DaWaK.

[9]  Igor Vajda,et al.  Estimation of the Information by an Adaptive Partitioning of the Observation Space , 1999, IEEE Trans. Inf. Theory.

[10]  Robert P. W. Duin,et al.  Parameter Estimation and State Estimation: An Engineering Approach using MATLAB , 2004 .

[11]  Jacek M. Zurada,et al.  Normalized Mutual Information Feature Selection , 2009, IEEE Transactions on Neural Networks.

[12]  Michel Verleysen,et al.  Information-theoretic feature selection for functional data classification , 2009, Neurocomputing.

[13]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .