Modeling Healthcare Quality via Compact Representations of Electronic Health Records

Increased availability of Electronic Health Record (EHR) data provides unique opportunities for improving the quality of health services. In this study, we couple EHRs with the advanced machine learning tools to predict three important parameters of healthcare quality. More specifically, we describe how to learn low-dimensional vector representations of patient conditions and clinical procedures in an unsupervised manner, and generate feature vectors of hospitalized patients useful for predicting their length of stay, total incurred charges, and mortality rates. In order to learn vector representations, we propose to employ state-of-the-art language models specifically designed for modeling co-occurrence of diseases and applied clinical procedures. The proposed model is trained on a large-scale EHR database comprising more than 35 million hospitalizations in California over a period of nine years. We compared the proposed approach to several alternatives and evaluated their effectiveness by measuring accuracy of regression and classification models used for three predictive tasks considered in this study. Our model outperformed the baseline models on all tasks, indicating a strong potential of the proposed approach for advancing quality of the healthcare system.

[1]  George Hripcsak,et al.  Next-generation phenotyping of electronic health records , 2012, J. Am. Medical Informatics Assoc..

[2]  Peter Dayan,et al.  Computational Phenotyping of Two-Person Interactions Reveals Differential Neural Response to Depth-of-Thought , 2012, PLoS Comput. Biol..

[3]  Elizabeth Warren,et al.  Illness and Injury as Contributors to Bankruptcy , 2005, Health affairs.

[4]  Jimeng Sun,et al.  Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization , 2014, KDD.

[5]  John L Moran,et al.  New models for old questions: generalized linear models for cost prediction. , 2007, Journal of evaluation in clinical practice.

[6]  Fabrizio Silvestri,et al.  Context- and Content-aware Embeddings for Query Rewriting in Sponsored Search , 2015, SIGIR.

[7]  S. Brunak,et al.  Mining electronic health records: towards better research applications and clinical care , 2012, Nature Reviews Genetics.

[8]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Jimeng Sun,et al.  Limestone: High-throughput candidate phenotype generation via tensor factorization , 2014, J. Biomed. Informatics.

[10]  Laura B. Madsen Data-Driven Healthcare: How Analytics and BI are Transforming the Industry , 2014 .

[11]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[12]  Fei Wang,et al.  From micro to macro: data driven phenotyping by densification of longitudinal electronic medical records , 2014, KDD.

[13]  Henry W. W. Potts,et al.  Predicting length of stay from an electronic patient record system: a primary total knee replacement example , 2014, BMC Medical Informatics and Decision Making.

[14]  Lou Ann Wiedemann Coding sepsis and SIRS. , 2007, Journal of AHIMA.

[15]  J. Birkmeyer,et al.  Surgical mortality as an indicator of hospital quality: the problem with small sample size. , 2004, JAMA.

[16]  Matthew J. Notowidigdo,et al.  Health Insurance and the Consumer Bankruptcy Decision: Evidence from Expansions of Medicaid , 2011 .

[17]  Sumeet Dua,et al.  Introduction to Machine Learning in Healthcare Informatics , 2014, Machine Learning in Healthcare Informatics.

[18]  H. Krumholz,et al.  Improving the quality of care for Medicare patients with acute myocardial infarction: results from the Cooperative Cardiovascular Project. , 1998, JAMA.

[19]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[20]  Shuiwang Ji,et al.  SLEP: Sparse Learning with Efficient Projections , 2011 .

[21]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[22]  J. Birkmeyer,et al.  Hospital Volume, Length of Stay, and Readmission Rates in High-Risk Surgery , 2003, Annals of surgery.

[23]  ICD-9-CM Official Guidelines for Coding and Reporting , 2002 .

[24]  Donald L. Barlett,et al.  Critical Condition: How Health Care in America Became Big Business--and Bad Medicine , 2004 .

[25]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[26]  Zoran Obradovic,et al.  Improving confidence while predicting trends in temporal disease networks , 2018, ArXiv.

[27]  Huan Liu,et al.  Leveraging social media networks for classification , 2011, Data Mining and Knowledge Discovery.

[28]  J. F. Fitzgerald,et al.  Predicting Inpatient Costs With Admitting Clinical Data , 1995, Medical care.

[29]  E. Elkin,et al.  Risk Adjusting Survival Outcomes in Hospitals That Treat Patients With Cancer Without Information on Cancer Stage. , 2015, JAMA oncology.

[30]  Marlene R. Miller,et al.  Excess length of stay, charges, and mortality attributable to medical injuries during hospitalization. , 2003, JAMA.

[31]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..