Exploring Unique Relevance for Mutual Information based Feature Selection

Mutual Information (MI), a measure from information theory, is widely used in feature selection. Despite its great success, a promising feature property, namely the unique relevance (UR) of a feature, remains unexplored. In this paper, we improve the performance of mutual information based feature selection (MIBFS) by exploring the utility of unique relevance (UR). We provide a theoretical justification for the value of UR and prove that the optimal feature subset must contain all features with UR. Since existing MIBFS follows the criterion of Maximize Relevance with Minimum Redundancy (MRwMR) which ignores UR of features, we augment it to include the objective of boosting unique relevance (BUR). This leads to a new criterion for MIBFS, called MRwMR-BUR. We conduct experiments on six public datasets and the results indicate that MRwMR-BUR consistently outperforms MRwMR when tested with three popular classifiers. We believe this new insight can lead to new optimality bounds and algorithms.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  Rossitza Setchi,et al.  Feature selection using Joint Mutual Information Maximisation , 2015, Expert Syst. Appl..

[3]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[4]  Nicolas Macris,et al.  The Mutual Information in Random Linear Estimation Beyond i.i.d. Matrices , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[5]  F. Agakov,et al.  Application of high-dimensional feature selection: evaluation for genomic prediction in man , 2015, Scientific Reports.

[6]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[7]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[8]  Shankar Vembu,et al.  Chemical gas sensor drift compensation using classifier ensembles , 2012 .

[9]  Qinbao Song,et al.  A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[10]  Lei Liu,et al.  Feature Selection Using Mutual Information: An Experimental Study , 2008, PRICAI.

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  David D. Lewis,et al.  Feature Selection and Feature Extraction for Text Categorization , 1992, HLT.

[13]  John E. Moody,et al.  Data Visualization and Feature Selection: New Algorithms for Nongaussian Data , 1999, NIPS.

[14]  Pramod Viswanath,et al.  Demystifying fixed k-nearest neighbor information estimators , 2016, 2017 IEEE International Symposium on Information Theory (ISIT).

[15]  Mehul Motani,et al.  SURI: Feature Selection Based on Unique Relevant Information for Health Data , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[16]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Eckehard Olbrich,et al.  Quantifying unique information , 2013, Entropy.

[18]  Max A. Little,et al.  Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection , 2007, Biomedical engineering online.

[19]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Sreeram Kannan,et al.  Estimating Mutual Information for Discrete-Continuous Mixtures , 2017, NIPS.

[21]  Douglas M. Hawkins,et al.  The Problem of Overfitting , 2004, J. Chem. Inf. Model..

[22]  Guido Montúfar,et al.  Computing the Unique Information , 2017, 2018 IEEE International Symposium on Information Theory (ISIT).

[23]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[24]  J. Susan Milton,et al.  Schaum's Outline of Introduction to Probability & Statistics: Principles & Applications for Engineering & the Computing Sciences , 1994 .

[25]  Jugal Kalita,et al.  A Fuzzy Mutual Information-based Feature Selection Method for Classification , 2016 .

[26]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[27]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[28]  Joe Suzuki Mutual Information Estimation: Independence Detection and Consistency , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[29]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .