Impact of diagnosis code grouping method on clinical prediction model performance: A multi-site retrospective observational study

OBJECTIVE The primary purpose of this work is to systematically assess the performance trade-offs on clinical prediction tasks of four diagnosis code groupings: AHRQ-Elixhauser, Single-level CCS, truncated ICD-9-CM codes, and raw ICD-9-CM codes. MATERIALS AND METHODS We used two distinct datasets from different geographic regions and patient populations and train models for three prediction tasks: 1-year mortality following an ICU stay, 30-day mortality following surgery, and 30-day complication following surgery. We run multiple commonly-used binary classification models including penalized logistic regression, random forest, and gradient boosted trees. Model performance is evaluated using the Area Under the Receiver Operating Characteristic (AUROC) and the Area Under the Precision-Recall Curve (AUCPR). RESULTS Single-level CCS, truncated codes, and raw codes significantly outperformed AHRQ-Elixhauser ICD grouping when predicting 30-day postoperative complication and one-year mortality after ICU admission. The performance across groupings was more similar in the 30-day postoperative mortality prediction task. DISCUSSION Single-level CCS groupings represent aggregations of raw codes into meaningful clinical concepts and consistently balance interoperability between ICD-9-CM and ICD-10-CM while maintaining strong model performance as measured by AUROC and AUCPR. Key limitations include experimentation across two datasets and three prediction tasks, which although were well labeled and sufficiently prevalent, do not encompass all modeling tasks and outcomes. CONCLUSION Single-level CCS groupings may serve as a good baseline for future models that incorporate diagnosis codes as features in clinical prediction tasks. Code and a compute environment summary are provided along with the analyses to enable reproducibility and to support future research.

[1]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[2]  J. Coselli,et al.  Transcatheter aortic valve replacement using a self-expanding bioprosthesis in patients with severe aortic stenosis at extreme risk for surgery. , 2014, Journal of the American College of Cardiology.

[3]  Carl van Walraven,et al.  Using the Johns Hopkins Aggregated Diagnosis Groups (ADGs) to Predict Mortality in a General Adult Population Cohort in Ontario, Canada , 2011, Medical care.

[4]  Mehdi Jamei,et al.  Predicting all-cause risk of 30-day hospital readmission using artificial neural networks , 2017, PloS one.

[5]  Mário J. Silva,et al.  Deep neural models for ICD-10 coding of death certificates and autopsy reports in free-text , 2018, J. Biomed. Informatics.

[6]  Yan Liu,et al.  Deep Learning Solutions for Classifying Patients on Opioid Use , 2017, AMIA.

[7]  Ianita Zlateva,et al.  Using electronic health records data to identify patients with chronic pain in a primary care setting. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[8]  Anna Goldenberg,et al.  Feature Robustness in Non-stationary Health Records: Caveats to Deployable Model Performance in Common Clinical Machine Learning Tasks , 2019, MLHC.

[9]  Frank D. Wood,et al.  Diagnosis code assignment: models and evaluation metrics , 2013, J. Am. Medical Informatics Assoc..

[10]  C. Mackenzie,et al.  A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. , 1987, Journal of chronic diseases.

[11]  H. Quan,et al.  Coding Algorithms for Defining Comorbidities in ICD-9-CM and ICD-10 Administrative Data , 2005, Medical care.

[12]  C. Steiner,et al.  Comorbidity measures for use with administrative data. , 1998, Medical care.

[13]  John F. Hurdle,et al.  Measuring diagnoses: ICD code accuracy. , 2005, Health services research.

[14]  Shelley A. Rusincovitch,et al.  A comparison of phenotype definitions for diabetes mellitus. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[15]  H. Krumholz,et al.  Transition to the ICD-10 in the United States: An Emerging Data Chasm. , 2018, JAMA.

[16]  Elizabeth C. Lorenzi,et al.  Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): A retrospective, single-site study , 2018, PLoS medicine.

[17]  Li Li,et al.  Predictive Modeling of Hospital Readmission Rates Using Electronic Medical Record-Wide Machine Learning: A Case-Study Using Mount Sinai Heart Failure Cohort , 2017, PSB.

[18]  Matthew D. Lakoma,et al.  Impact of ICD-10-CM Transition on Mental Health Diagnoses Recording , 2019, EGEMS.

[19]  Girish N. Nadkarni,et al.  Leveraging hierarchy in medical codes for predictive modeling , 2014, BCB.

[20]  Lisa I. Iezzoni,et al.  Risk Adjustment of Medicare Capitation Payments Using the CMS-HCC Model , 2004, Health care financing review.

[21]  P. Austin,et al.  The Mortality Risk Score and the ADG Score: Two Points-Based Scoring Systems for the Johns Hopkins Aggregated Diagnosis Groups to Predict Mortality in a General Adult Population Cohort in Ontario, Canada , 2011, Medical care.

[22]  R. Deyo,et al.  Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases. , 1992, Journal of clinical epidemiology.

[23]  L. Schneider,et al.  Antipsychotics, other psychotropics, and the risk of death in patients with dementia: number needed to harm. , 2015, JAMA psychiatry.