Predicting Future High-Cost Patients: A Real-World Risk Modeling Application

Health care data from patients in the Arizona Health Care Cost Containment System, Arizona's Medicaid program, provides a unique opportunity to exploit state-of-the-art data processing and analysis algorithms to mine the data and provide actionable results that can aid cost containment. This work addresses specific challenges in this real-life health care application to build predictive risk models for forecasting future high-cost users. Such predictive risk modeling has received attention in recent years with statistical techniques being the backbone of proposed methods. We survey the literature and propose a novel data mining approach customized for this potent application. Our empirical study indicates that this approach is useful and can benefit further research on cost containment in the health care industry.

[1]  Mark Kosinski,et al.  Using the SF-12 Health Status Measure to Improve Predictions of Medical Expenditures , 2006, Medical care.

[2]  J. Farley,et al.  A comparison of comorbidity measurements to predict healthcare expenditures. , 2006, The American journal of managed care.

[3]  John W. Williams,et al.  Common comorbidity scales were similar in their ability to predict health care costs and mortality. , 2004, Journal of clinical epidemiology.

[4]  Gary Weiss,et al.  Does cost-sensitive learning beat sampling for classifying rare classes? , 2005, UBDM '05.

[5]  Thomas Bodenheimer,et al.  High and Rising Health Care Costs. Part 1: Seeking an Explanation , 2005, Annals of Internal Medicine.

[6]  Arthur W. Wetzel Computational Aspects of Pathology Image Classification and Retrieval , 1997, The Journal of Supercomputing.

[7]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[8]  Arlene S Ash,et al.  Predicting Pharmacy Costs and Other Medical Costs Using Diagnoses and Drug Claims , 2005, Medical care.

[9]  Foster Provost,et al.  The effect of class distribution on classifier learning: an empirical study , 2001 .

[10]  Lei Zheng,et al.  Design and analysis of a content-based pathology image retrieval system , 2003, IEEE Transactions on Information Technology in Biomedicine.

[11]  Robert C. Holte,et al.  C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , 2003 .

[12]  Paul A. Fishman,et al.  Using Risk-Adjustment Models to Identify High-Cost Risks , 2003, Medical care.

[13]  M. Maloof Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown , 2003 .

[14]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[15]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[16]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[17]  Rajesh Balkrishnan,et al.  Risk classification of Medicare HMO enrollee cost levels using a decision-tree approach. , 2004, The American journal of managed care.

[18]  Dongsong Zhang,et al.  Discovering golden nuggets: data mining in financial application , 2004, IEEE Trans. Syst. Man Cybern. Part C.

[19]  Paul B Ginsburg,et al.  High and rising health care costs. , 2008, The Synthesis project. Research synthesis report.

[20]  A. Monheit,et al.  The concentration of health care expenditures, revisited. , 2001, Health affairs.

[21]  D Y Lin,et al.  Methods for analyzing health care utilization and costs. , 1999, Annual review of public health.

[22]  Rudolf Hanka,et al.  Histological image retrieval based on semantic content analysis , 2003, IEEE Transactions on Information Technology in Biomedicine.

[23]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .