Using machine learning approaches to predict high-cost chronic obstructive pulmonary disease patients in China

The accurate identification and prediction of high-cost Chronic obstructive pulmonary disease (COPD) patients is important for addressing the economic burden of COPD. The objectives of this study were to use machine learning approaches to identify and predict potential high-cost patients and explore the key variables of the forecasting model, by comparing differences in the predictive performance of different variable sets. Machine learning approaches were used to estimate the medical costs of COPD patients using the Medical Insurance Data of a large city in western China. The prediction models used were logistic regression, random forest (RF), and extreme gradient boosting (XGBoost). All three models had good predictive performance. The XGBoost model outperformed the others. The areas under the ROC curve for Logistic Regression, RF and XGBoost were 0.787, 0.792 and 0.801. The precision and accuracy metrics indicated that the methods achieved correct and reliable results. The results of this study can be used by healthcare data analysts, policy makers, insurers, and healthcare planners to improve the delivery of health services.

[1]  H. Quan,et al.  New ICD-10 version of the Charlson comorbidity index predicted in-hospital mortality. , 2004, Journal of clinical epidemiology.

[2]  Chuan-Fen Liu,et al.  Performance of Comorbidity, Risk Adjustment, and Functional Status Measures in Expenditure Prediction for Patients With Diabetes , 2009, Diabetes Care.

[3]  Paul Pocatilu,et al.  Measuring the efficiency of cloud computing for e-learning systems , 2010 .

[4]  Bernard Friedman,et al.  Hospital Inpatient Costs for Adults with Multiple Chronic Conditions , 2006, Medical care research and review : MCRR.

[5]  J. Garcia-Aymerich,et al.  Risk factors of readmission to hospital for a COPD exacerbation: a prospective study , 2003, Thorax.

[6]  Sebastian Schneeweiss,et al.  Improved prediction of medical expenditures and health care utilization using an updated chronic disease score and claims data. , 2013, Journal of clinical epidemiology.

[7]  Dimitris Spathis,et al.  Diagnosing asthma and chronic obstructive pulmonary disease with machine learning , 2019, Health Informatics J..

[8]  J. Weiner,et al.  Comparison of alternative risk adjustment measures for predictive modeling: high risk patient case finding using Taiwan's National Health Insurance claims , 2010, BMC health services research.

[9]  Craig E. Kuziemsky,et al.  Identifying high-cost patients using data mining techniques and a small set of non-trivial attributes , 2014, Comput. Biol. Medicine.

[10]  Yasuki Kobayashi,et al.  Sequential Evaluation of the National Medical Expenditures for Asthma Care in Japan , 2004, Journal of epidemiology.

[11]  Jeffrey Dean,et al.  Scalable and accurate deep learning with electronic health records , 2018, npj Digital Medicine.

[12]  Arlene S Ash,et al.  Predicting Pharmacy Costs and Other Medical Costs Using Diagnoses and Drug Claims , 2005, Medical care.

[13]  W. Guan,et al.  Prevention and management of COPD in China: successes and major challenges. , 2016, The Lancet. Respiratory medicine.

[14]  P. McCollam,et al.  Predictors of high-cost managed care patients with acute coronary syndrome* , 2005, Current medical research and opinion.

[15]  Flávio H. D. Araújo,et al.  Using machine learning to support healthcare professionals in making preauthorisation decisions , 2016, Int. J. Medical Informatics.

[16]  J. Fleishman,et al.  Using information on clinical conditions to predict high-cost patients. , 2010, Health services research.

[17]  Santosh S. Vempala,et al.  Algorithmic Prediction of Health-Care Costs , 2008, Oper. Res..

[18]  R. J. Kuo,et al.  A medical cost estimation with fuzzy neural network of acute hepatitis patients in emergency room , 2015, Comput. Methods Programs Biomed..

[19]  J. Wish,et al.  Healthcare Expenditure and Resource Utilization in Patients with Anaemia and Chronic Kidney Disease: A Retrospective Claims Database Analysis , 2009, Kidney and Blood Pressure Research.

[20]  Jiang He,et al.  Health Care Expenditure Prediction With a Single Item, Self-Rated Health Measure , 2009, Medical care.

[21]  Chris Cameron,et al.  Direct costs of adult chronic rhinosinusitis by using 4 methods of estimation: Results of the US Medical Expenditure Panel Survey. , 2015, The Journal of allergy and clinical immunology.

[22]  Andreas Spanias,et al.  Attend and Diagnose: Clinical Time Series Analysis using Attention Models , 2017, AAAI.

[23]  Yi-Horng Lai,et al.  Network-Based Analysis of Comorbidities: Case Study of Diabetes Mellitus , 2015, MISNC.

[24]  D. Globe,et al.  The burden of adult asthma in the United States: evidence from the Medical Expenditure Panel Survey. , 2011, The Journal of allergy and clinical immunology.

[25]  Toki Saito,et al.  How can machine-learning methods assist in virtual screening for hyperuricemia? A healthcare machine-learning approach , 2016, J. Biomed. Informatics.

[26]  Gloria E. Phillips-Wren,et al.  Mining lung cancer patient data to assess healthcare resource utilization , 2008, Expert Syst. Appl..

[27]  Charles Elkan,et al.  Learning to Diagnose with LSTM Recurrent Neural Networks , 2015, ICLR.

[28]  Yong Shi,et al.  Historical Claims Data based Hybrid Predictive Models for Hospitalization , 2014, ICCS.

[29]  L. Wilson,et al.  Direct medical costs of chronic obstructive pulmonary disease: chronic bronchitis and emphysema. , 2000, Respiratory medicine.

[30]  K. Pietz,et al.  A Decision-Theoretic Approach to Identifying Future High-Cost Patients , 2006, Medical care.

[31]  Michael T. Halpern,et al.  Prevalence of Outpatient Cancer Treatment in the United States: Estimates from the Medical Panel Expenditures Survey (MEPS) , 2008, Cancer investigation.

[32]  Iker Gondra,et al.  Applying machine learning to software fault-proneness prediction , 2008, J. Syst. Softw..

[33]  C. Bai,et al.  COPD in China , 2011, Chest.

[34]  J. Farley,et al.  A comparison of comorbidity measurements to predict healthcare expenditures. , 2006, The American journal of managed care.

[35]  Mark Kosinski,et al.  Using the SF-12 Health Status Measure to Improve Predictions of Medical Expenditures , 2006, Medical care.

[36]  E. Ford,et al.  Total and state-specific medical and absenteeism costs of COPD among adults aged ≥ 18 years in the United States for 2010 and projections through 2020. , 2015, Chest.

[37]  A. Mukherji,et al.  Pareto Efficiency, Inequality and Distribution Neutral Fiscal Policy—An Overview , 2018, New Economic Windows.

[38]  A. Lau,et al.  Hospital re-admission in patients with acute exacerbation of chronic obstructive pulmonary disease. , 2001, Respiratory medicine.

[39]  Jack Mardekian,et al.  Use of electronic health records for early detection of high-cost, low back pain patients , 2015, Pain research & management.

[40]  Balázs Kégl,et al.  The Higgs boson machine learning challenge , 2014, HEPML@NIPS.

[41]  Terri Jackson,et al.  Analysis of cost outliers within APR-DRGs in a Belgian general hospital: two complementary approaches. , 2006, Health policy.

[42]  C. Mackenzie,et al.  A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. , 1987, Journal of chronic diseases.

[43]  K. Rabe,et al.  COPD: The role of primary care in effective diagnosis, treatment and management. , 2003, Primary care respiratory journal : journal of the General Practice Airways Group.

[44]  H. González,et al.  Medical Expenditures Among Immigrant and Nonimmigrant Groups in the United States: Findings From the Medical Expenditures Panel Survey (2000–2008) , 2012, Medical care.

[45]  Yong-Moo Suh,et al.  Copyright � The Korean Academy of Medical Sciences Comparison of Hospital Charge Prediction Models for Colorectal Cancer Patients: Neural Network vs. Decision Tree Models , 2004 .