Individual dynamic prediction of clinical endpoint from large dimensional longitudinal biomarker history: a landmark approach

The individual data collected throughout patient follow-up constitute crucial information for assessing the risk of a clinical event, and eventually for adapting a therapeutic strategy. Joint models and landmark models have been proposed to compute individual dynamic predictions from repeated measures to one or two markers. However, they hardly extend to the case where the complete patient history includes much more repeated markers possibly. Our objective was thus to propose a solution for the dynamic prediction of a health event that may exploit repeated measures of a possibly large number of markers. We combined a landmark approach extended to endogenous markers history with machine learning methods adapted to survival data. Each marker trajectory is modeled using the information collected up to landmark time, and summary variables that best capture the individual trajectories are derived. These summaries and additional covariates are then included in different prediction methods. To handle a possibly large dimensional history, we rely on machine learning methods adapted to survival data, namely regularized regressions and random survival forests, to predict the event from the landmark time, and we show how they can be combined into a superlearner. Then, the performances are evaluated by cross-validation using estimators of Brier Score and the area under the Receiver Operating Characteristic curve adapted to censored data. We demonstrate in a simulation study the benefits of machine learning survival methods over standard survival models, especially in the case of numerous and/or nonlinear relationships between the predictors and the event. We then applied the methodology in two prediction contexts: a clinical context with the prediction of death for patients with primary biliary cholangitis, and a public health context with the prediction of death in the general elderly population at different ages. Our methodology, implemented in R, enables the prediction of an event using the entire longitudinal patient history, even when the number of repeated markers is large. Although introduced with mixed models for the repeated markers and methods for a single right censored time-to-event, our method can be used with any other appropriate modeling technique for the markers and can be easily extended to competing risks setting.

[1]  Paul S Albert,et al.  On Estimating the Relationship between Longitudinal Measurements and Time‐to‐Event Data Using a Simple Two‐Stage Procedure , 2009, Biometrics.

[2]  H. Müller,et al.  Functional Data Analysis for Sparse Longitudinal Data , 2005 .

[3]  H. Putter,et al.  Dynamic Prediction in Clinical Survival Analysis , 2011 .

[4]  Hemant Ishwaran,et al.  Random survival forests for competing risks. , 2014, Biostatistics.

[5]  S. Keleş,et al.  Sparse partial least squares regression for simultaneous dimension reduction and variable selection , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[6]  J. Ware,et al.  Random-effects models for longitudinal data. , 1982, Biometrics.

[7]  H. V. Houwelingen Dynamic Prediction by Landmarking in Event History Analysis , 2007 .

[8]  By Shu Jiang,et al.  Functional Ensemble Survival Tree: Dynamic Prediction of Alzheimer’s Disease Progression Accommodating Multiple Time-Varying Covariates , 2020, bioRxiv.

[9]  M. J. van der Laan,et al.  Statistical Applications in Genetics and Molecular Biology Super Learner , 2010 .

[10]  Tianxi Cai,et al.  On longitudinal prediction with time‐to‐event outcome: Comparison of modeling options , 2017, Biometrics.

[11]  Hemant Ishwaran,et al.  Evaluating Random Forests for Survival Analysis using Prediction Error Curves. , 2012, Journal of statistical software.

[12]  Jean-François Dartigues,et al.  Estimating and comparing time‐dependent areas under receiver operating characteristic curves for censored event times with competing risks , 2013, Statistics in medicine.

[13]  Cécile Proust-Lima,et al.  Shared random-effect models for the joint analysis of longitudinal and time-to-event data: application to the prediction of prostate cancer recurrence , 2014 .

[14]  Anastasios A. Tsiatis,et al.  Joint Modeling of Longitudinal and Time-to-Event Data : An Overview , 2004 .

[15]  Myriam Maumy-Bertrand,et al.  Deviance residuals-based sparse PLS and sparse kernel PLS regression for censored data , 2015, Bioinform..

[16]  P. Grambsch,et al.  Primary biliary cirrhosis: Prediction of short‐term survival based on repeated patient visits , 1994, Hepatology.

[17]  B. Goldstein,et al.  Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges , 2016, European heart journal.

[18]  Irwin Nazareth,et al.  Landmark Models for Optimizing the Use of Repeated Measurements of Risk Factors in Electronic Health Records to Predict Future Disease Risk , 2018, American journal of epidemiology.

[19]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[20]  Uri Shaham,et al.  DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network , 2016, BMC Medical Research Methodology.

[21]  P. Mecocci,et al.  Random Forest ensembles for detection and prediction of Alzheimer's disease with a good between-cohort robustness , 2014, NeuroImage: Clinical.

[22]  Benoit Liquet,et al.  Estimation of extended mixed models using latent classes and latent processes: the R package lcmm , 2015, 1503.00890.

[23]  J. Goeman L1 Penalized Estimation in the Cox Proportional Hazards Model , 2009, Biometrical journal. Biometrische Zeitschrift.

[24]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[25]  Cécile Proust-Lima,et al.  Development and validation of a dynamic prognostic tool for prostate cancer recurrence using repeated measures of posttreatment PSA: a joint modeling approach. , 2009, Biostatistics.

[26]  M Maclure,et al.  Performance of comorbidity scores to control for confounding in epidemiologic studies using claims data. , 2001, American journal of epidemiology.

[27]  Cécile Proust-Lima,et al.  Individual dynamic predictions using landmarking and joint modelling: Validation of estimators and robustness assessment , 2017, Statistical methods in medical research.

[28]  D Commenges,et al.  Mortality with dementia: results from a French prospective community-based cohort. , 2001, American journal of epidemiology.

[29]  Xihong Lin,et al.  Semiparametric Modeling of Longitudinal Measurements and Time‐to‐Event Data–A Two‐Stage Regression Calibration Approach , 2008, Biometrics.

[30]  Cécile Proust-Lima,et al.  Quantifying and comparing dynamic predictive accuracy of joint models for longitudinal marker and time‐to‐event in presence of censoring and competing risks , 2015, Biometrics.

[31]  Matthias Schmid,et al.  A review of spline function procedures in R , 2019, BMC Medical Research Methodology.

[32]  Susan Murray,et al.  Incorporating longitudinal biomarkers for dynamic risk prediction in the era of big data: A pseudo‐observation approach , 2020, Statistics in medicine.

[33]  Dimitris Rizopoulos,et al.  Dynamic Predictions and Prospective Accuracy in Joint Models for Longitudinal and Time‐to‐Event Data , 2011, Biometrics.

[34]  K. Lazaridis,et al.  Primary biliary cirrhosis , 1998, Springer Netherlands.

[35]  E. Polley,et al.  Super Learner for Survival Data Prediction , 2020, The international journal of biostatistics.

[36]  Angela M Wood,et al.  The use of repeated blood pressure measures for cardiovascular risk prediction: a comparison of statistical models in the ARIC study , 2016, Statistics in medicine.

[37]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[38]  Yongseok Park,et al.  Real‐Time Individual Predictions of Prostate Cancer Recurrence Using Joint Models , 2013, Biometrics.

[39]  Ruth H. Keogh,et al.  Dynamic survival prediction combining landmarking with a machine learning ensemble: Methodology and empirical comparison , 2020, Journal of the Royal Statistical Society: Series A (Statistics in Society).