Nutrition and Health Data for Cost-Sensitive Learning

Traditionally, machine learning algorithms have been focused on modeling dynamics of a certain dataset at hand for which all features are available for free. However, there are many concerns such as monetary data collection costs, patient discomfort in medical procedures, and privacy impacts of data collection that require careful consideration in any health analytics system. An efficient solution would only acquire a subset of features based on the value it provides whilst considering acquisition costs. Moreover, datasets that provide feature costs are very limited, especially in healthcare. In this paper, we provide a health dataset as well as a method for assigning feature costs based on the total level of inconvenience asking for each feature entails. Furthermore, based on the suggested dataset, we provide a comparison of recent and state-of-the-art approaches to cost-sensitive feature acquisition and learning. Specifically, we analyze the performance of major sensitivity-based and reinforcement learning based methods in the literature on three different problems in the health domain, including diabetes, heart disease, and hypertension classification.

[1]  Majid Sarrafzadeh,et al.  Opportunistic Learning: Budgeted Cost-Sensitive Learning from Data Streams , 2019, ICLR.

[2]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[3]  Maytal Saar-Tsechansky,et al.  Economical active feature-value acquisition through Expected Utility estimation , 2005, UBDM '05.

[4]  Murat Kantarcioglu,et al.  Privacy-aware dynamic feature selection , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[5]  Edward Y. Chang,et al.  REFUEL: Exploring Sparse Features in Deep Reinforcement Learning for Fast Disease Diagnosis , 2018, NeurIPS.

[6]  Stephen E. Fienberg,et al.  Test time feature ordering with FOCUS: interactive predictions with minimal user burden , 2016, UbiComp.

[7]  Adnan Darwiche,et al.  Value of Information Based on Decision Robustness , 2015, AAAI.

[8]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[9]  Louis B. Rall,et al.  Automatic differentiation , 1981 .

[10]  V. Preedy,et al.  National Health and Nutrition Examination Survey , 2010 .

[11]  Majid Sarrafzadeh,et al.  Dynamic Feature Acquisition Using Denoising Autoencoders , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Kilian Q. Weinberger,et al.  The Greedy Miser: Learning under Test-time Budgets , 2012, ICML.

[13]  Tomás Pevný,et al.  Classification with Costly Features using Deep Reinforcement Learning , 2019, AAAI.

[14]  Matt J. Kusner,et al.  Classifier cascades and trees for minimizing feature evaluation cost , 2014, J. Mach. Learn. Res..

[15]  R. Bharat Rao,et al.  Cost-Sensitive Machine Learning , 2011 .

[16]  Eunho Yang,et al.  Why pay more when you can pay less? , 2018, BDJ.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Trevor Darrell,et al.  Timely Object Recognition , 2012, NIPS.