Target-Focused Feature Selection Using a Bayesian Approach

In many real-world scenarios where data is high dimensional, test time acquisition of features is a non-trivial task due to costs associated with feature acquisition and evaluating feature value. The need for highly confident models with an extremely frugal acquisition of features can be addressed by allowing a feature selection method to become target aware. We introduce an approach to feature selection that is based on Bayesian learning, allowing us to report target-specific levels of uncertainty, false positive, and false negative rates. In addition, measuring uncertainty lifts the restriction on feature selection being target agnostic, allowing for feature acquisition based on a single target of focus out of many. We show that acquiring features for a specific target is at least as good as common linear feature selection approaches for small non-sparse datasets, and surpasses these when faced with real-world healthcare data that is larger in scale and in sparseness.

[1]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[2]  John Yearwood,et al.  A Hybrid Feature Selection With Ensemble Classification for Imbalanced Healthcare Data: A Case Study for Brain Tumor Diagnosis , 2016, IEEE Access.

[3]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[4]  Kewei Cheng,et al.  Feature Selection , 2016, ACM Comput. Surv..

[5]  Guy Van den Broeck,et al.  Optimal Feature Selection for Decision Robustness in Bayesian Networks , 2017, IJCAI.

[6]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[7]  V. Preedy,et al.  National Health and Nutrition Examination Survey , 2010 .

[8]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[11]  Xiaofeng Zhu,et al.  Robust Feature Selection on Incomplete Data , 2018, IJCAI.

[12]  Peter Groves,et al.  The 'big data' revolution in healthcare: Accelerating value and innovation , 2016 .

[13]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[14]  Yann LeCun,et al.  Transforming Neural-Net Output Levels to Probability Distributions , 1990, NIPS.

[15]  Pavel Brazdil,et al.  Cost-Sensitive Decision Trees Applied to Medical Data , 2007, DaWaK.

[16]  Adnan Darwiche,et al.  Same-decision probability: A confidence measure for threshold-based decisions , 2012, Int. J. Approx. Reason..

[17]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[18]  Zoubin Ghahramani,et al.  Probabilistic machine learning and artificial intelligence , 2015, Nature.

[19]  Rómer Rosales,et al.  Active Sensing , 2009, AISTATS.

[20]  E. El-Darzi,et al.  Healthcare Data Mining: Prediction Inpatient Length of Stay , 2006, 2006 3rd International IEEE Conference Intelligent Systems.

[21]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[22]  Sriraam Natarajan,et al.  On Whom Should I Perform this Lab Test Next? An Active Feature Elicitation Approach , 2018, IJCAI.

[23]  Terry Anthony Byrd,et al.  Big data analytics: Understanding its capabilities and potential benefits for healthcare organizations , 2018 .

[24]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[25]  Majid Sarrafzadeh,et al.  Opportunistic Learning: Budgeted Cost-Sensitive Learning from Data Streams , 2019, ICLR.

[26]  Majid Sarrafzadeh,et al.  Nutrition and Health Data for Cost-Sensitive Learning , 2019, ArXiv.