Semi-Supervised Prediction of Comorbid Rare Conditions Using Medical Claims Data

Medical insurance claims data offer a coarse view of a patient's medical profile, including information about previous diagnoses and procedures performed. These data have been exploited in the past to predict presence of unmanifested conditions. Rarer conditions however, provide an extremely limited amount of ground truth to train supervised models, but predicting relevant co-morbidities can help reduce failure to rescue from a treatable, yet potentially life threatening condition. In this paper, we aim at a formidable task of improving models built to predict comorbidity of rare conditions that emerge during hospitalization and present PreCoRC, a novel approach that leverages hierarchical structures of diagnosis and procedure codes to alleviate the relatively low prevalence of specific types of Failure to Rescue (FTR) incidents. It can be applied post-hoc over previously learnt predictive models, and used to discover parts of the underlying hierarchies that contribute to the task. Our experimental results demonstrate that PreCoRC carries promise for operational utility in clinical settings, and offer insights into potential leading indicators of life threatening complications.

[1]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[2]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[3]  C. Pui,et al.  Management of occlusion and thrombosis associated with long-term indwelling central venous catheters , 2009, The Lancet.

[4]  Lidong Bing,et al.  Improving Distant Supervision for Information Extraction Using Label Propagation Through Lists , 2015, EMNLP.

[5]  A. Simonds Respiratory Complications of the Muscular Dystrophies , 2002, Seminars in respiratory and critical care medicine.

[6]  Jenna Wiens,et al.  Patient Risk Stratification for Hospital-Associated C. diff as a Time-Series Classification Task , 2012, NIPS.

[7]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[8]  Fei Wang,et al.  Combining Knowledge and Data Driven Insights for Identifying Risk Factors using Electronic Health Records , 2012, AMIA.

[9]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[10]  Zoran Obradovic,et al.  Improving Hospital Readmission Prediction Using Domain Knowledge Based Virtual Examples , 2015, KMO.

[11]  Jimeng Sun,et al.  Predicting changes in hypertension control using electronic health records from a chronic disease management program , 2014, J. Am. Medical Informatics Assoc..

[12]  Artur Dubrawski,et al.  Gleaning knowledge from data in the intensive care unit. , 2014, American journal of respiratory and critical care medicine.

[13]  J. A. N. Eedleman,et al.  Nurse-Staffing Levels and the Quality of Care in Hospitals , 2002 .

[14]  G. Clermont,et al.  Dynamic and Personalized Risk Forecast in Step‐Down Units. Implications for Monitoring Paradigms , 2017, Annals of the American Thoracic Society.

[15]  Lidong Bing,et al.  Distant IE by Bootstrapping Using Lists and Document Structure , 2016, AAAI.

[16]  Roman Garnett,et al.  Active search on graphs , 2013, KDD.

[17]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[18]  F. Rosendaal,et al.  Deep vein thrombosis associated with central venous catheters – a review , 2005, Journal of thrombosis and haemostasis : JTH.

[19]  Girish N. Nadkarni,et al.  Leveraging hierarchy in medical codes for predictive modeling , 2014, BCB.

[20]  Gilles Clermont,et al.  Learning temporal rules to forecast instability in continuously monitored patients , 2017, J. Am. Medical Informatics Assoc..

[21]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[22]  John V. Guttag,et al.  Transferring Knowledge from Text to Predict Disease Onset , 2016, MLHC.

[23]  I. Ben Ghorbel,et al.  Deep vein thrombosis in Behçet's disease. , 2001, Clinical and experimental rheumatology.

[24]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[25]  Peter Buerhaus,et al.  Nurse-staffing levels and the quality of care in hospitals. , 2002, The New England journal of medicine.

[26]  Jimeng Sun,et al.  PARAMO: A PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records , 2014, J. Biomed. Informatics.