Evaluation of survival distribution predictions with discrimination measures

In this paper we consider how to evaluate survival distribution predictions with measures of discrimination. This is a non-trivial problem as discrimination measures are the most commonly used in survival analysis and yet there is no clear method to derive a risk prediction from a distribution prediction. We survey methods proposed in literature and software and consider their respective advantages and disadvantages. Whilst distributions are frequently evaluated by discrimination measures, we find that the method for doing so is rarely described in the literature and often leads to unfair comparisons. We find that the most robust method of reducing a distribution to a risk is to sum over the predicted cumulative hazard. We recommend that machine learning survival analysis software implements clear transformations between distribution and risk predictions in order to allow more transparent and accessible model evaluation. ∗raphaelsonabend@gmail.com 1 ar X iv :2 11 2. 04 82 8v 1 [ st at .M L ] 9 D ec 2 02 1

[1]  Elia Biganzoli,et al.  A time‐dependent discrimination index for survival data , 2005, Statistics in medicine.

[2]  R. Omar,et al.  Review and evaluation of performance measures for survival prediction models in external validation settings , 2017, BMC Medical Research Methodology.

[3]  Sabine Van Huffel,et al.  Support vector machines for survival analysis , 2007 .

[4]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[5]  M. Pencina,et al.  On the C‐statistics for evaluating overall adequacy of risk prediction procedures with censored survival data , 2011, Statistics in medicine.

[6]  Bernd Bischl,et al.  mlr3proba: an R package for machine learning in survival analysis , 2020, Bioinform..

[7]  Dai Feng,et al.  Deep Neural Networks for Survival Analysis Using Pseudo Values , 2019, IEEE Journal of Biomedical and Health Informatics.

[8]  Moritz Herrmann,et al.  Large-scale benchmark study of survival prediction methods using multi-omics data , 2020, Briefings Bioinform..

[9]  Hemant Ishwaran,et al.  Evaluating Random Forests for Survival Analysis using Prediction Error Curves. , 2012, Journal of statistical software.

[10]  H C van Houwelingen,et al.  Validation, calibration, revision and combination of prognostic survival models. , 2000, Statistics in medicine.

[11]  M. Gonen,et al.  Concordance probability and discriminatory power in proportional hazards regression , 2005 .

[12]  Robert Tibshirani,et al.  Survival analysis as a classification problem , 2019, 1909.11171.

[13]  Arcot Sowmya,et al.  A comparison of machine learning methods for survival analysis of high-dimensional clinical data for dementia prediction , 2020, Scientific Reports.

[14]  D.,et al.  Regression Models and Life-Tables , 2022 .

[15]  P. Grambsch,et al.  A Package for Survival Analysis in S , 1994 .

[16]  Russell Greiner,et al.  Effective Ways to Build and Evaluate Individual Survival Distributions , 2018, J. Mach. Learn. Res..

[17]  F. Harrell,et al.  Evaluating the yield of medical tests. , 1982, JAMA.

[18]  Ida Scheel,et al.  Time-to-Event Prediction with Neural Networks and Cox Regression , 2019, J. Mach. Learn. Res..

[19]  Yee Whye Teh,et al.  Gaussian Processes for Survival Analysis , 2016, NIPS.

[20]  G. Collins,et al.  External validation of multivariable prediction models: a systematic review of methodological conduct and reporting , 2014, BMC Medical Research Methodology.

[21]  P. V. Rao,et al.  Applied Survival Analysis: Regression Modeling of Time to Event Data , 2000 .

[22]  Changhee Lee,et al.  DeepHit: A Deep Learning Approach to Survival Analysis With Competing Risks , 2018, AAAI.

[23]  Uri Shaham,et al.  DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network , 2016, BMC Medical Research Methodology.

[24]  G. Wong,et al.  SurvBenchmark: comprehensive benchmarking study of survival analysis methods using both omics data and clinical data , 2021, bioRxiv.

[25]  Ørnulf Borgan,et al.  Continuous and discrete-time survival prediction with neural networks , 2019, Lifetime Data Analysis.

[26]  Balasubramanian Narasimhan,et al.  A scalable discrete-time survival model for neural networks , 2018, PeerJ.

[27]  T. Lumley,et al.  Time‐Dependent ROC Curves for Censored Survival Data and a Diagnostic Marker , 2000, Biometrics.

[28]  David W. Hosmer,et al.  Applied Survival Analysis: Regression Modeling of Time-to-Event Data , 2008 .

[29]  Thomas A Gerds,et al.  The c‐index is not proper for the evaluation of t‐year predicted risks , 2019, Biostatistics.

[30]  Sebastian Pölsterl,et al.  scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn , 2020, J. Mach. Learn. Res..

[31]  W. Vach,et al.  On the misuses of artificial neural networks for prognostic and diagnostic classification in oncology. , 2000, Statistics in medicine.