AutoScore-Survival: Developing interpretable machine learning-based time-to-event scores with right-censored survival data

Background: Scoring systems are highly interpretable and widely used to evaluate time-toevent outcomes in healthcare research. However, existing time-to-event scores are predominantly created ad-hoc using a few manually selected variables based on clinician's knowledge, suggesting an unmet need for a robust and efficient generic score-generating method. Methods: AutoScore was previously developed as an interpretable machine learning score generator, integrated both machine learning and point-based scores in the strong discriminability and accessibility. We have further extended it to time-to-event data and developed AutoScore-Survival, for automatically generating time-to-event scores with rightcensored survival data. Random survival forest provides an efficient solution for selecting variables, and Cox regression was used for score weighting. We implemented our proposed method as an R package. We illustrated our method in a real-life study of 90-day mortality of patients in intensive care units and compared its performance with survival models (i.e., Cox) and the random survival forest. Results: The AutoScore-Survival-derived scoring model was more parsimonious than survival models built using traditional variable selection methods (e.g., penalized likelihood approach and stepwise variable selection), and its performance was comparable to survival models using the same set of variables. Although AutoScore-Survival achieved a comparable

[1]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[2]  Cynthia Rudin,et al.  Supersparse linear integer models for optimized medical scoring systems , 2015, Machine Learning.

[3]  Scott L. Zeger,et al.  Clinical risk prediction with random forests for survival, longitudinal, and multivariate (RF-SLAM) data analysis , 2019, BMC Medical Research Methodology.

[4]  Gang Li,et al.  A Selective Review on Random Survival Forests for High Dimensional Data. , 2017, Quantitative bio-science.

[5]  Bibhas Chakraborty,et al.  AutoScore: A Machine Learning–Based Automatic Clinical Score Generator and Its Application to Mortality Prediction Using Electronic Health Records , 2020, JMIR medical informatics.

[6]  Cynthia Rudin,et al.  Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges , 2021, ArXiv.

[7]  A. Raftery,et al.  Bayesian Information Criterion for Censored Survival Models , 2000, Biometrics.

[8]  Zhi-gang Wu,et al.  U-shaped relationship of age at diagnosis and cancer-specific mortality in primary urachal adenocarcinoma: a cohort study , 2020, Translational andrology and urology.

[9]  Jerzy Adamski,et al.  Random Survival Forest in practice: a method for modelling complex metabolomics data in time to event analysis. , 2016, International journal of epidemiology.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[12]  R. Kolamunnage-Dona,et al.  Time-dependent ROC curve analysis in medical research: current methods and applications , 2017, BMC Medical Research Methodology.

[13]  D. Matchar,et al.  Development and Assessment of an Interpretable Machine Learning Triage Tool for Estimating Mortality After Emergency Admissions , 2021, JAMA network open.

[14]  John P. A. Ioannidis,et al.  Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review , 2017, J. Am. Medical Informatics Assoc..

[15]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[16]  M. Ghazisaeedi,et al.  Improving the Prediction of Survival in Cancer Patients by Using Machine Learning Techniques: Experience of Gene Expression Data: A Narrative Review , 2017, Iranian journal of public health.

[17]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[18]  M. Maltoni,et al.  Successful validation of the palliative prognostic score in terminally ill cancer patients. Italian Multicenter Study Group on Palliative Care. , 1999, Journal of pain and symptom management.

[19]  Philipp Probst,et al.  To tune or not to tune the number of trees in random forest? , 2017, J. Mach. Learn. Res..

[20]  M. Sherman,et al.  Toronto HCC risk index: A validated scoring system to predict 10-year risk of HCC in patients with cirrhosis. , 2017, Journal of hepatology.

[21]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[22]  Hua Liang,et al.  Improved AIC selection strategy for survival analysis , 2008, Comput. Stat. Data Anal..

[23]  J. L. Gall,et al.  A simplified acute physiology score for ICU patients , 1984, Critical care medicine.

[24]  Robert Andersen Nonparametric Methods for Modeling Nonlinearity in Regression Analysis , 2009 .

[25]  Ankur Teredesai,et al.  Interpretable Machine Learning in Healthcare , 2018, 2018 IEEE International Conference on Healthcare Informatics (ICHI).

[26]  D. Finkelstein,et al.  A proportional hazards model for interval-censored failure time data. , 1986, Biometrics.

[27]  C. Sprung,et al.  Use of the SOFA score to assess the incidence of organ dysfunction/failure in intensive care units: results of a multicenter, prospective study. Working group on "sepsis-related problems" of the European Society of Intensive Care Medicine. , 1998, Critical care medicine.

[28]  James E. Helmreich Regression Modeling Strategies with Applications to Linear Models, Logistic and Ordinal Regression and Survival Analysis (2nd Edition) , 2016 .

[29]  Mark R. Segal,et al.  Regression Trees for Censored Data , 1988 .

[30]  D. Brodie,et al.  Predicting survival after extracorporeal membrane oxygenation for severe acute respiratory failure. The Respiratory Extracorporeal Membrane Oxygenation Survival Prediction (RESP) score. , 2014, American journal of respiratory and critical care medicine.

[31]  Jianqing Fan,et al.  Contemporary Multivariate Analysis and Design of Experiments: In Celebration of Professor Kai-Tai Fang's 65th Birthday , 2005 .

[32]  Hemant Ishwaran,et al.  Identifying Important Risk Factors for Survival in Patient With Systolic Heart Failure Using Random Survival Forests , 2011, Circulation. Cardiovascular quality and outcomes.

[33]  H. Akaike A new look at the statistical model identification , 1974 .

[34]  M. LeBlanc,et al.  Survival Trees by Goodness of Split , 1993 .

[35]  Cynthia Rudin,et al.  Learning Optimized Risk Scores , 2016, J. Mach. Learn. Res..

[36]  A. Bauer-Mehren,et al.  An Enhanced Prognostic Score for Overall Survival of Patients with Cancer Derived from a large Real World Cohort. , 2020, Annals of oncology : official journal of the European Society for Medical Oncology.

[37]  Sylvie Chevret,et al.  Summary measure of discrimination in survival models based on cumulative/dynamic time-dependent ROC curves , 2016, Statistical methods in medical research.

[38]  Enrico Longato,et al.  A practical perspective on the concordance index for the evaluation and selection of prognostic time-to-event models , 2020, J. Biomed. Informatics.

[39]  G. Rodŕıguez,et al.  Parametric Survival Models , 2010 .

[40]  X. Paoletti,et al.  Development and validation of a new prognostic score of death for patients with hepatocellular carcinoma in palliative setting. , 2011, Journal of hepatology.

[41]  Renato Umeton,et al.  Automated machine learning: Review of the state-of-the-art and opportunities for healthcare , 2020, Artif. Intell. Medicine.

[42]  D.,et al.  Regression Models and Life-Tables , 2022 .

[43]  Maryam FARHADIAN,et al.  Identifying Important Risk Factors for Survival in Kidney Graft Failure Patients Using Random Survival Forests , 2016, Iranian journal of public health.

[44]  M. Maltoni,et al.  A new palliative prognostic score: a first step for the staging of terminally ill cancer patients. Italian Multicenter and Study Group on Palliative Care. , 1999, Journal of pain and symptom management.

[45]  T. Lumley,et al.  Time‐Dependent ROC Curves for Censored Survival Data and a Diagnostic Marker , 2000, Biometrics.

[46]  P. Heagerty,et al.  Survival Model Predictive Accuracy and ROC Curves , 2005, Biometrics.

[47]  Carsten Nieder,et al.  Survival Prediction Score: A Simple but Age-Dependent Method Predicting Prognosis in Patients Undergoing Palliative Radiotherapy , 2014, ISRN oncology.

[48]  I. Cha,et al.  Deep learning-based survival prediction of oral cancer patients , 2019, Scientific Reports.

[49]  Arcot Sowmya,et al.  A comparison of machine learning methods for survival analysis of high-dimensional clinical data for dementia prediction , 2020, Scientific Reports.

[50]  Sean Shao Wei Lam,et al.  Novel model for predicting inpatient mortality after emergency admission to hospital in Singapore: retrospective observational study , 2019, BMJ Open.

[51]  M. Pencina,et al.  Overall C as a measure of discrimination in survival analysis: model specific population value and confidence interval estimation , 2004, Statistics in medicine.

[52]  Jong Soo Choi,et al.  Clinical scoring system for the prediction of survival of patients with advanced gastric cancer , 2020, ESMO Open.

[53]  Sabine Van Huffel,et al.  Support vector methods for survival analysis: a comparison between ranking and regression approaches , 2011, Artif. Intell. Medicine.

[54]  G. Schwarz Estimating the Dimension of a Model , 1978 .