Resampling methods for parameter-free and robust feature selection with mutual information

Combining the mutual information criterion with a forward feature selection strategy offers a good trade-off between optimality of the selected feature subset and computation time. However, it requires to set the parameter(s) of the mutual information estimator and to determine when to halt the forward procedure. These two choices are difficult to make because, as the dimensionality of the subset increases, the estimation of the mutual information becomes less and less reliable. This paper proposes to use resampling methods, a K-fold cross-validation and the permutation test, to address both issues. The resampling methods bring information about the variance of the estimator, information which can then be used to automatically set the parameter and to calculate a threshold to stop the forward procedure. The procedure is illustrated on a synthetic data set as well as on the real-world examples.

[1]  Michel Verleysen,et al.  Representation of functional data in neural networks , 2005, Neurocomputing.

[2]  Thomas Villmann,et al.  Generalized relevance learning vector quantization , 2002, Neural Networks.

[3]  Michel Verleysen,et al.  Mutual information for the selection of relevant variables in spectrometric nonlinear modelling , 2006, ArXiv.

[4]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[5]  Cees Diks,et al.  Tests for Serial Independence and Linearity Based on Correlation Integrals , 2002 .

[6]  Jerzy Neyman,et al.  Egon S. Pearson (August 11, 1895-June 12, 1980). An Appreciation , 1981 .

[7]  A. S. Weigend,et al.  Selecting Input Variables Using Mutual Information and Nonparemetric Density Estimation , 1994 .

[8]  Deniz Erdoğmuş,et al.  Blind source separation using Renyi's mutual information , 2001, IEEE Signal Processing Letters.

[9]  D. Bradley,et al.  Neural population code for fine perceptual decisions in area MT , 2005, Nature Neuroscience.

[10]  J. Friedman Multivariate adaptive regression splines , 1990 .

[11]  Ian H. Witten,et al.  Using a Permutation Test for Attribute Selection in Decision Trees , 1998, ICML.

[12]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[13]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[14]  Michel Verleysen,et al.  On the Kernel Widths in Radial-Basis Function Networks , 2003, Neural Processing Letters.

[15]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  Xiaobo Zhou,et al.  Gene Clustering Based on Clusterwide Mutual Information , 2004, J. Comput. Biol..

[17]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[19]  Gordon Broderick,et al.  Exploration of statistical dependence between illness parameters using the entropy correlation coefficient. , 2006, Pharmacogenomics.

[20]  Marc M. Van Hulle,et al.  Speeding Up the Wrapper Feature Subset Selection in Regression by Mutual Information Relevance and Redundancy Analysis , 2006, ICANN.

[21]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[22]  Slawomir J. Nasuto,et al.  Mutual Information for EEG Analysis , 2005 .

[23]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[24]  Zoran Obradovic,et al.  Feature Selection Filters Based on the Permutation Test , 2004, ECML.

[25]  Joachim Selbig,et al.  Species-specific analysis of protein sequence motifs using mutual information , 2005, BMC Bioinformatics.

[26]  Antanas Verikas,et al.  Feature selection with neural networks , 2002, Pattern Recognit. Lett..

[27]  Jerome H. Friedman Multivariate adaptive regression splines (with discussion) , 1991 .

[28]  Thomas Lengauer,et al.  Confirmation of human protein interaction data by human expression data , 2005, BMC Bioinformatics.

[29]  Michel Verleysen,et al.  The permutation test for feature selection by mutual information , 2006, ESANN.

[30]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[31]  Nenad Koncar,et al.  A note on the Gamma test , 1997, Neural Computing & Applications.

[32]  Celia A Schiffer,et al.  Covariation of amino acid positions in HIV-1 protease. , 2003, Virology.

[33]  C. Conrad,et al.  Automatic identification of subcellular phenotypes on human cell arrays. , 2004, Genome research.

[34]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[35]  Glenn Fung,et al.  A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[36]  J. D. Opdyke Fast Permutation Tests that Maximize Power Under Conventional Monte Carlo Sampling for Pairwise and Multiple Comparisons , 2003 .