Can high-order dependencies improve mutual information based feature selection?

Mutual information (MI) based approaches are a popular paradigm for feature selection. Most previous methods have made use of low-dimensional MI quantities that are only effective at detecting low-order dependencies between variables. Several works have considered the use of higher dimensional mutual information, but the theoretical underpinning of these approaches is not yet comprehensive. To fill this gap, in this paper, we systematically investigate the issues of employing high-order dependencies for mutual information based feature selection. We first identify a set of assumptions under which the original high-dimensional mutual information based criterion can be decomposed into a set of low-dimensional MI quantities. By relaxing these assumptions, we arrive at a principled approach for constructing higher dimensional MI based feature selection methods that takes into account higher order feature interactions. Our extensive experimental evaluation on real data sets provides concrete evidence that methodological inclusion of high-order dependencies improve MI based feature selection. HighlightsWe shed light on the assumptions behind many popular mutual information based feature selection approaches.We propose a theoretically sound approach to extend current mutual information based feature selection methods to handle high-order dependencies.We provide concrete experimental evidence that high-order dependencies improve mutual information based feature selection.

[1]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  LujánMikel,et al.  Conditional likelihood maximisation , 2012 .

[3]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[4]  Wei Liu,et al.  Conditional Mutual Information Based Feature Selection , 2008, 2008 International Symposium on Knowledge Acquisition and Modeling.

[5]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[6]  Tommy W. S. Chow,et al.  Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information , 2005, IEEE Transactions on Neural Networks.

[8]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[9]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[10]  Marko Robnik-Sikonja,et al.  Overcoming the Myopia of Inductive Learning Algorithms with RELIEFF , 2004, Applied Intelligence.

[11]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[12]  James Bailey,et al.  Reconsidering Mutual Information Based Feature Selection: A Statistical Significance View , 2014, AAAI.

[13]  James Bailey,et al.  Effective global approaches for mutual information based feature selection , 2014, KDD.

[14]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[15]  Yang Wang,et al.  Mutual information-based method for selecting informative feature sets , 2013, Pattern Recognit..

[16]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[17]  Charles Elkan,et al.  Quadratic Programming Feature Selection , 2010, J. Mach. Learn. Res..

[18]  Chee Keong Kwoh,et al.  A Feature Subset Selection Method Based On High-Dimensional Mutual Information , 2011, Entropy.

[19]  Dahua Lin,et al.  Conditional Infomax Learning: An Integrated Framework for Feature Extraction and Fusion , 2006, ECCV.

[20]  Vir V. Phoha,et al.  On the Feature Selection Criterion Based on an Approximation of Multidimensional Mutual Information , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[22]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.