SURI: Feature Selection Based on Unique Relevant Information for Health Data

Feature selection, which identifies representative features in observed data, can increase the utility of health data for predictive diagnosis. Unlike feature extraction, such as PCA and autoencoder based methods, feature selection preserves interpretability, meaning that it provides useful information about which feature subset is relevant to certain health conditions. Domain experts, such as clinicians, can learn from these relationships and use this knowledge to improve their diagnostic abilities. Mutual information (MI) based feature selection (MIBFS) is a classifier-independent approach that attempts to maximize the dependency (i.e., the MI) between the selected features and the target variable (label). However, implementing optimal MIBFS via exhaustive search with high-dimensional data can be prohibitively complex. As a result, many MIBFS approximation schemes have been developed in the literature. In this paper, we take another step forward by proposing a novel MIBFS method called Selection via Unique Relevant Information (SURI). We first quantify the unique relevant information (URI) present in each individual feature and use it to boost features with high URI. Via experiments on 6 healthcare data sets and 3 classifiers, we observe that SURI outperforms existing MIBFS methods, with respect to standard classification metrics. Furthermore, using a low-dimensional data set, we investigate optimal feature selection via exhaustive search and confirm the important role of URI, further verifying the principles behind SURI.

[1]  Mehul Motani,et al.  Spatio-temporal autoencoder for feature learning in patient data with missing observations , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[2]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[3]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[4]  Rossitza Setchi,et al.  Feature selection using Joint Mutual Information Maximisation , 2015, Expert Syst. Appl..

[5]  Colas Schretter,et al.  Information-Theoretic Feature Selection in Microarray Data Using Variable Complementarity , 2008, IEEE Journal of Selected Topics in Signal Processing.

[6]  James Bailey,et al.  Effective global approaches for mutual information based feature selection , 2014, KDD.

[7]  Daniel B. Neill,et al.  Using Artificial Intelligence to Improve Hospital Inpatient Care , 2013, IEEE Intelligent Systems.

[8]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[9]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  David D. Lewis,et al.  Feature Selection and Feature Extraction for Text Categorization , 1992, HLT.

[11]  Finale Doshi-Velez,et al.  Mind the Gap: A Generative Approach to Interpretable Feature Selection and Extraction , 2015, NIPS.

[12]  Aram Galstyan,et al.  Variational Information Maximization for Feature Selection , 2016, NIPS.

[13]  G. A. Barnard,et al.  Transmission of Information: A Statistical Theory of Communications. , 1961 .

[14]  Matsuda,et al.  Physical nature of higher-order mutual information: intrinsic correlations and frustration , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[15]  F. Agakov,et al.  Application of high-dimensional feature selection: evaluation for genomic prediction in man , 2015, Scientific Reports.

[16]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[17]  M. Muralidhara,et al.  REMOVAL OF REDUNDANT AND IRRELEVANT DATA FROM TRAINING DATASETS USING SPEEDY FEATURE SELECTION METHOD , 2016 .

[18]  Lin Sun,et al.  Feature selection using mutual information based uncertainty measures for tumor classification. , 2014, Bio-medical materials and engineering.

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[23]  Mehul Motani,et al.  Learning Deep Representations from Heterogeneous Patient Data for Predictive Diagnosis , 2017, BCB.

[24]  Chandrajit L. Bajaj,et al.  Higher Order Mutual Information Approximation for Feature Selection , 2016, ArXiv.

[25]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[27]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Jugal Kalita,et al.  A Fuzzy Mutual Information-based Feature Selection Method for Classification , 2016 .