A configurable, big data system for on-demand healthcare cost prediction

Predictive modeling is becoming increasingly common in healthcare. Existing healthcare cost prediction solutions are tailor-made to accomplish specific tasks for certain populations, hence requiring expensive modifications to adapt to a different task or population. In this paper, we present a modular and extensible solution for healthcare cost prediction, which can be easily configured for various prediction tasks and populations. Our solution incorporates efficient high-dimensional data handling, smart feature engineering, flexible predictive learning, individualized assessment of cost impacts of predictors, and a management system that allows for reuse of partial results. We configure two distinct applications using the proposed system and present results on prediction accuracy and cost impact assessment. The first application predicts healthcare costs for a commercial population, and the second predicts the cost of care for a Medicaid population using an entirely different set of data, predictors, and assumptions.

[1]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[2]  Kush R. Varshney,et al.  Health Insurance Market Risk Assessment: Covariate Shift and k-Anonymity , 2015, SDM.

[3]  Jimeng Sun,et al.  PARAMO: A PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records , 2014, J. Biomed. Informatics.

[4]  LarrañagaPedro,et al.  A review of feature selection techniques in bioinformatics , 2007 .

[5]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Michael L. Cohen,et al.  The HHS-HCC Risk Adjustment Model for Individual and Small Group Markets under the Affordable Care Act , 2014 .

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  D. Roblin,et al.  A low-cost approach to prospective identification of impending high cost outcomes. , 1999, Medical care.

[9]  Lisa I. Iezzoni,et al.  Risk Adjustment of Medicare Capitation Payments Using the CMS-HCC Model , 2004, Health care financing review.

[10]  John Haughton,et al.  Identifying Future High-Cost Cases Through Predictive Modeling , 2003 .

[11]  Arlene S Ash,et al.  Predicting Pharmacy Costs and Other Medical Costs Using Diagnoses and Drug Claims , 2005, Medical care.

[12]  Ian Duncan,et al.  Testing Alternative Regression Frameworks for Predictive Modeling of Healthcare Costs , 2015 .

[13]  Emil Pitkin,et al.  Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation , 2013, 1309.6392.

[14]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[15]  Truven Health Medical Episode Grouper : Applications and Methodology , .

[16]  David M. Adamson,et al.  HealtH ReseaRcH Data foR tHe Real WoRlD: tHe MaRketscan ® Databases , 2008 .

[17]  Santosh S. Vempala,et al.  Algorithmic Prediction of Health-Care Costs , 2008, Oper. Res..

[18]  M. Kulldorff,et al.  International Journal of Health Geographics Open Access a Scan Statistic for Continuous Data Based on the Normal Probability Model , 2022 .