DeepPseudo: Pseudo Value Based Deep Learning Models for Competing Risk Analysis

Competing Risk Analysis (CRA) aims at the correct estimation of the marginal probability of occurrence of an event in the presence of competing events. Many of the statistical approaches developed for CRA are limited by strong assumptions about the underlying stochastic processes. To overcome these issues and to handle censoring, machine learning approaches for CRA have designed specialized cost functions. However, these approaches are not generalizable and are computationally expensive. This paper formulates CRA as a cause-specific regression problem and proposes DeepPseudo models, which use simple and effective feed-forward deep neural networks, to predict the cumulative incidence function (CIF) using Aalen-Johansen estimator-based pseudo values. DeepPseudo models capture the time-varying covariate effect on CIF while handling the censored observations. We show how DeepPseudo models can address covariate dependent censoring by using modified pseudo values. Experiments on real and synthetic datasets demonstrate that our proposed models obtain promising and statistically significant results compared to the state-of-the-art CRA approaches. Furthermore, we show that explainable methods such as Layer-wise Relevance Propagation can be used to interpret the predictions of our DeepPseudo models.

[1]  Klaus-Robert Müller,et al.  Layer-Wise Relevance Propagation: An Overview , 2019, Explainable AI.

[2]  Ewout W Steyerberg,et al.  Competing risks and the clinical community: irrelevance or ignorance? , 2011, Statistics in medicine.

[3]  Thomas A Gerds,et al.  Pseudo-observations for competing risks with covariate dependent censoring , 2014, Lifetime data analysis.

[4]  Mihaela van der Schaar,et al.  Multitask Boosting for Survival Analysis with Competing Risks , 2018, NeurIPS.

[5]  Klaus-Robert Müller,et al.  iNNvestigate neural networks! , 2018, J. Mach. Learn. Res..

[6]  Thomas A Gerds,et al.  A random forest approach for competing risks based on pseudo‐values , 2013, Statistics in medicine.

[7]  John P. Klein,et al.  SAS and R functions to compute pseudo-values for censored data regression , 2008, Comput. Methods Programs Biomed..

[8]  Ahmed M. Alaa,et al.  Deep Multi-task Gaussian Processes for Survival Analysis with Competing Risks , 2017, NIPS.

[9]  E. Kaplan,et al.  Nonparametric Estimation from Incomplete Observations , 1958 .

[10]  Yang-jin Kim,et al.  Analysis of interval censored competing risk data with missing causes of failure using pseudo values approach , 2017 .

[11]  Georgios B. Giannakis,et al.  Online Censoring for Large-Scale Regressions with Application to Streaming Big Data , 2015, IEEE Transactions on Signal Processing.

[12]  M. Schumacher,et al.  On pseudo-values for regression analysis in competing risks models , 2009, Lifetime data analysis.

[13]  Dai Feng,et al.  Deep Neural Networks for Survival Analysis Using Pseudo Values , 2019, IEEE Journal of Biomedical and Health Informatics.

[14]  Hemant Ishwaran,et al.  Random survival forests for competing risks. , 2014, Biostatistics.

[15]  Changhee Lee,et al.  Dynamic-DeepHit: A Deep Learning Approach for Dynamic Survival Analysis With Competing Risks Based on Longitudinal Data , 2020, IEEE Transactions on Biomedical Engineering.

[16]  Lei Zheng,et al.  Deep Recurrent Survival Analysis , 2018, AAAI.

[17]  M. Young,et al.  The Women's Interagency HIV Study: an Observational Cohort Brings Clinical Sciences to the Bench , 2005, Clinical Diagnostic Laboratory Immunology.

[18]  Robert Gray,et al.  A Proportional Hazards Model for the Subdistribution of a Competing Risk , 1999 .

[19]  Erik T. Parner,et al.  Regression Analysis of Censored Data Using Pseudo-observations , 2010 .

[20]  Dana E King,et al.  Multimorbidity Trends in United States Adults, 1988–2014 , 2018, The Journal of the American Board of Family Medicine.

[21]  Ren Johansen An Empirical Transition Matrix for Non-homogeneous Markov Chains Based on Censored Observations , 1978 .

[22]  Sin-Ho Jung,et al.  Statistical Methods for Conditional Survival Analysis , 2018, Journal of biopharmaceutical statistics.

[23]  Changhee Lee,et al.  DeepHit: A Deep Learning Approach to Survival Analysis With Competing Risks , 2018, AAAI.

[24]  Hemant Ishwaran,et al.  Evaluating Random Forests for Survival Analysis using Prediction Error Curves. , 2012, Journal of statistical software.

[25]  Laurence L. George,et al.  The Statistical Analysis of Failure Time Data , 2003, Technometrics.

[26]  Thomas A Gerds,et al.  Estimating a time‐dependent concordance index for survival prediction models with covariate dependent censoring , 2013, Statistics in medicine.

[27]  D.,et al.  Regression Models and Life-Tables , 2022 .

[28]  Walter R. Young,et al.  The Statistical Analysis of Failure Time Data , 1981 .

[29]  Mihaela van der Schaar,et al.  Tree-based Bayesian Mixture Model for Competing Risks , 2018, AISTATS.

[30]  John P Klein,et al.  Regression Modeling of Competing Risks Data Based on Pseudovalues of the Cumulative Incidence Function , 2005, Biometrics.

[31]  H Putter,et al.  Tutorial in biostatistics: competing risks and multi‐state models , 2007, Statistics in medicine.