Learning (predictive) risk scores in the presence of censoring due to interventions

A large and diverse set of measurements are regularly collected during a patient’s hospital stay to monitor their health status. Tools for integrating these measurements into severity scores, that accurately track changes in illness severity, can improve clinicians’ ability to provide timely interventions. Existing approaches for creating such scores either (1) rely on experts to fully specify the severity score, (2) infer a score using detailed models of disease progression, or (3) train a predictive score, using supervised learning, by regressing against a surrogate marker of severity such as the presence of downstream adverse events. The first approach does not extend to diseases where an accurate score cannot be elicited from experts. The second assumes that the progression of disease can be accurately modeled, limiting its application to populations with simple, well-understood disease dynamics. The third approach, also most commonly used, often produces scores that suffer from bias due to treatment-related censoring (Paxton et al. in AMIA annual symposium proceedings, American Medical Informatics Association, p 1109, 2013). Specifically, since the downstream outcomes used for their training are observed only noisily and are influenced by treatment administration patterns, these scores do not generalize well when treatment administration patterns change. We propose a novel ranking based framework for disease severity score learning (DSSL). DSSL exploits the following key observation: while it is challenging for experts to quantify the disease severity at any given time, it is often easy to compare the disease severity at two different times. Extending existing ranking algorithms, DSSL learns a function that maps a vector of patient’s measurements to a scalar severity score subject to two constraints. First, the resulting score should be consistent with the expert’s ranking of the disease severity state. Second, changes in score between consecutive periods should be smooth. We apply DSSL to the problem of learning a sepsis severity score using a large, real-world electronic health record dataset. The learned scores significantly outperform state-of-the-art clinical scores in ranking patient states by severity and in early detection of downstream adverse events. We also show that the learned disease severity trajectories are consistent with clinical expectations of disease evolution. Further, we simulate datasets containing different treatment administration patterns and show that DSSL shows better generalization performance to changes in treatment patterns compared to the above approaches.

[1]  E. Draper,et al.  APACHE II: A severity of disease classification system , 1985, Critical care medicine.

[2]  Jstor Journal of the Royal Statistical Society. Series D, (The statistician) , 1993 .

[3]  C. Sprung,et al.  Multiple organ dysfunction score: a reliable descriptor of a complex clinical outcome. , 1995, Critical care medicine.

[4]  Inger,et al.  A prediction rule to identify low-risk patients with community-acquired pneumonia. , 1997, The New England journal of medicine.

[5]  Ralf Herbrich,et al.  Large margin rank boundaries for ordinal regression , 2000 .

[6]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[7]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[8]  G. Clermont,et al.  Predicting hospital mortality for patients in the intensive care unit: A comparison of artificial neural networks with logistic regression models , 2001, Critical care medicine.

[9]  R G Mark,et al.  MIMIC II: a massive temporal ICU patient database to support research in intelligent patient monitoring , 2002, Computers in Cardiology.

[10]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[11]  Simon G. Thompson,et al.  Multistate Markov models for disease progression with classification error , 2003 .

[12]  T. Medsger,et al.  Assessment of disease severity and prognosis. , 2003, Clinical and experimental rheumatology.

[13]  J. Vincent,et al.  The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure , 1996, Intensive Care Medicine.

[14]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[15]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[16]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[17]  Christopher J. C. Burges,et al.  High accuracy retrieval with multiple nested ranker , 2006, SIGIR.

[18]  Quoc V. Le,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, NIPS.

[19]  Tao Qin,et al.  Ranking with multiple hyperplanes , 2007, SIGIR.

[20]  A. Kramer,et al.  Effect of a rapid response system for patients in shock on time to treatment and mortality during 5 years* , 2007, Critical care medicine.

[21]  Hongyuan Zha,et al.  A General Boosting Method and its Application to Learning Ranking Functions for Web Search , 2007, NIPS.

[22]  Tao Qin,et al.  FRank: a ranking method with fidelity loss , 2007, SIGIR.

[23]  Wei Chu,et al.  Support Vector Ordinal Regression , 2007, Neural Computation.

[24]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent in Function Space , 2007 .

[25]  A. Abu-Hanna,et al.  Evaluation of SOFA-based models for predicting mortality in the ICU: A systematic review , 2008, Critical care.

[26]  J. Barker Effect of a rapid response system for patients in shock on time to treatment and mortality during 5 years , 2009 .

[27]  Caleb W. Hug,et al.  Detecting hazardous intensive care patient episodes using real-time mortality models , 2009 .

[28]  S. Sathiya Keerthi,et al.  Efficient algorithms for ranking with SVMs , 2010, Information Retrieval.

[29]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[30]  D. Koller,et al.  Integration of Early Physiological Responses Predicts Later Illness Severity in Preterm Infants , 2010, Science Translational Medicine.

[31]  M. Saeed,et al.  Multiparameter Intelligent Monitoring in Intensive Care Ii (Mimic-Ii): A Public-Access Intensive Care Unit Database , 2011 .

[32]  G. Kumar,et al.  Nationwide trends of severe sepsis in the 21st century (2000-2007). , 2011, Chest.

[33]  T. H. Kyaw,et al.  Multiparameter Intelligent Monitoring in Intensive Care II: A public-access intensive care unit database* , 2011, Critical care medicine.

[34]  M. Keegan,et al.  Severity of illness scoring systems in the intensive care unit , 2011, Critical care medicine.

[35]  H. Bitterman,et al.  Assessment of disease-severity scoring systems for patients with sepsis in general internal medicine departments , 2011, Critical care.

[36]  Kilian Q. Weinberger,et al.  Web-Search Ranking with Initialized Gradient Boosted Regression Trees , 2010, Yahoo! Learning to Rank Challenge.

[37]  Jenna Wiens,et al.  Patient Risk Stratification for Hospital-Associated C. diff as a Time-Series Classification Task , 2012, NIPS.

[38]  C. Sprung,et al.  Surviving Sepsis Campaign: International Guidelines for Management of Severe Sepsis and Septic Shock, 2012 , 2013, Intensive Care Medicine.

[39]  Cheng H. Lee,et al.  Imputation-Enhanced Prediction of Septic Shock in ICU Patients , 2012 .

[40]  D. Mould Models for Disease Progression: New Approaches and Uses , 2012, Clinical pharmacology and therapeutics.

[41]  Suchi Saria,et al.  Developing Predictive Models Using Electronic Medical Records: Challenges and Pitfalls , 2013, AMIA.

[42]  Chih-Jen Lin,et al.  Large-scale Kernel RankSVM , 2014, SDM.

[43]  Xiang Wang,et al.  Unsupervised learning of disease progression models , 2014, KDD.

[44]  P. Pronovost,et al.  A targeted real-time early warning score (TREWScore) for septic shock , 2015, Science Translational Medicine.

[45]  Suchi Saria,et al.  Learning a Severity Score for Sepsis: A Novel Approach based on Clinical Comparisons , 2015, AMIA.

[46]  M. J. van der Laan,et al.  Mortality prediction in intensive care units with the Super ICU Learner Algorithm (SICULA): a population-based study. , 2015, The Lancet. Respiratory medicine.