Flexible Modelling of Longitudinal Medical Data

Using electronic medical records to learn personalized risk trajectories poses significant challenges because often very few samples are available in a patient’s history, and, when available, their information content is highly diverse. In this article, we consider how to integrate sparsely sampled longitudinal data, missing measurements informative of the underlying health status, and static information to estimate (dynamically, as new information becomes available) personalized survival distributions. We achieve this by developing a nonparametric probabilistic model that generates survival trajectories, and corresponding uncertainty estimates, from an ensemble of Bayesian trees in which time is incorporated explicitly to learn variable interactions over time, without needing to specify the longitudinal process beforehand. As such, the changing influence on survival of variables over time is inferred from the data directly, which we analyze with post-processing statistics derived from our model.

[1]  Mihaela van der Schaar,et al.  Disease-Atlas: Navigating Disease Trajectories with Deep Learning , 2018, ArXiv.

[2]  Dimitris Rizopoulos,et al.  Dynamic Predictions and Prospective Accuracy in Joint Models for Longitudinal and Time‐to‐Event Data , 2011, Biometrics.

[3]  Paula Williamson,et al.  A review of the handling of missing longitudinal outcome data in clinical trials , 2014, Trials.

[4]  A. Dreher Modeling Survival Data Extending The Cox Model , 2016 .

[5]  Mihaela van der Schaar,et al.  DPSCREEN: Dynamic Personalized Screening , 2017, NIPS.

[6]  H. V. Houwelingen Dynamic Prediction by Landmarking in Event History Analysis , 2007 .

[7]  P. Austin Generating survival times to simulate Cox proportional hazards models with time-varying covariates , 2012, Statistics in medicine.

[8]  Walter F. Stewart,et al.  Doctor AI: Predicting Clinical Events via Recurrent Neural Networks , 2015, MLHC.

[9]  Melanie L Bell,et al.  Handling missing data in RCTs; a review of the top medical journals , 2014, BMC Medical Research Methodology.

[10]  R Henderson,et al.  Joint modelling of longitudinal measurements and event time data. , 2000, Biostatistics.

[11]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[12]  K. Bhaskaran,et al.  Representativeness and optimal use of body mass index (BMI) in the UK Clinical Practice Research Datalink (CPRD) , 2013, BMJ Open.

[13]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[14]  Thomas A Gerds,et al.  Estimating a time‐dependent concordance index for survival prediction models with covariate dependent censoring , 2013, Statistics in medicine.

[15]  Suchi Saria,et al.  Scalable Joint Models for Reliable Uncertainty-Aware Event Prediction , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Adam Kapelner,et al.  bartMachine: Machine Learning with Bayesian Additive Regression Trees , 2013, 1312.2171.

[17]  Cécile Proust-Lima,et al.  Individual dynamic predictions using landmarking and joint modelling: Validation of estimators and robustness assessment , 2017, Statistical methods in medical research.

[18]  Lana Fani,et al.  Lifetime risk and multimorbidity of non-communicable diseases and disease-free life expectancy in the general population: A population-based cohort study , 2019, PLoS medicine.

[19]  Purushottam W. Laud,et al.  Nonparametric survival analysis using Bayesian Additive Regression Trees (BART) , 2016, Statistics in medicine.

[20]  Rebecca Hardy,et al.  Life Course Trajectories of Systolic Blood Pressure Using Longitudinal Data from Eight UK Cohorts , 2011, PLoS medicine.

[21]  Tanmoy Bhattacharya,et al.  The need for uncertainty quantification in machine-assisted medical decision making , 2019, Nat. Mach. Intell..

[22]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[23]  R. Tibshirani,et al.  Bayesian backfitting (with comments and a rejoinder by the authors , 2000 .

[24]  Mihaela van der Schaar,et al.  Multitask Boosting for Survival Analysis with Competing Risks , 2018, NeurIPS.

[25]  Mihaela van der Schaar,et al.  Boosted Trees for Risk Prognosis , 2018, MLHC.

[26]  Adler J. Perotte,et al.  Deep Survival Analysis , 2016, MLHC.

[27]  David J. Hand,et al.  Good methods for coping with missing data in decision trees , 2008, Pattern Recognit. Lett..

[28]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[29]  John B. Willett,et al.  It’s About Time: Using Discrete-Time Survival Analysis to Study Duration and the Timing of Events , 1993 .

[30]  S. Klingenberg,et al.  Systematic review and meta‐analysis: D‐Penicillamine vs. placebo/no intervention in patients with primary biliary cirrhosis – Cochrane Hepato‐Biliary Group , 2006, Alimentary pharmacology & therapeutics.

[31]  B. Roos,et al.  A Review of Selected Longitudinal Studies on Aging: Past Findings and Future Directions , 2010, Journal of the American Geriatrics Society.

[32]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[33]  Mihaela van der Schaar,et al.  Learning from Clinical Judgments: Semi-Markov-Modulated Marked Hawkes Processes for Risk Prognosis , 2017, ICML.

[34]  R. Tibshirani,et al.  Bayesian Backfitting , 1998 .

[35]  Eleni-Rosalina Andrinopoulou,et al.  An introduction to mixed models and joint modeling: analysis of valve function over time. , 2012, The Annals of thoracic surgery.

[36]  Zachary C. Lipton,et al.  The Doctor Just Won't Accept That! , 2017, 1711.08037.

[37]  Graeme L. Hickey,et al.  Joint modelling of time-to-event and multivariate longitudinal outcomes: recent developments and issues , 2016, BMC Medical Research Methodology.

[38]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[39]  Yee Whye Teh,et al.  Gaussian Processes for Survival Analysis , 2016, NIPS.