An evaluation of machine learning techniques to predict the outcome of children treated for Hodgkin-Lymphoma on the AHOD0031 trial

ABSTRACT In this manuscript, we analyze a data set containing information on children with Hodgkin Lymphoma (HL) enrolled on a clinical trial. Treatments received and survival status were collected together with other covariates such as demographics and clinical measurements. Our main task is to explore the potential of machine learning (ML) algorithms in a survival analysis context in order to improve over the Cox Proportional Hazard (CoxPH) model. We discuss the weaknesses of the CoxPH model we would like to improve upon and then we introduce multiple algorithms, from well-established ones to state-of-the-art models, that solve these issues. We then compare every model according to the concordance index and the Brier score. Finally, we produce a series of recommendations, based on our experience, for practitioners that would like to benefit from the recent advances in artificial intelligence.

[1]  Kurt Hornik,et al.  A Laboratory for Recursive Partytioning [R package party version 1.3-5] , 2020 .

[2]  Dai Feng,et al.  Deep Neural Networks for Survival Analysis Using Pseudo Values , 2019, IEEE Journal of Biomedical and Health Informatics.

[3]  Pablo M. Olmos,et al.  Handling Incomplete Heterogeneous Data using VAEs , 2018, Pattern Recognit..

[4]  Chaofeng Li,et al.  A deep survival analysis method based on ranking , 2019, Artif. Intell. Medicine.

[5]  T. Helbich,et al.  Stand-Alone Artificial Intelligence for Breast Cancer Detection in Mammography: Comparison With 101 Radiologists. , 2019, Journal of the National Cancer Institute.

[6]  Jie Ma,et al.  A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. , 2019, Journal of clinical epidemiology.

[7]  E. Krupinski,et al.  Detection of Breast Cancer with Mammography: Effect of an Artificial Intelligence Support System. , 2019, Radiology.

[8]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[9]  J. Cavanaugh,et al.  Partial Likelihood , 2018, Wiley StatsRef: Statistics Reference Online.

[10]  David Hodgson,et al.  A Deep Latent-Variable Model Application to Select Treatment Intensity in Survival Analysis , 2018, ArXiv.

[11]  Uri Shaham,et al.  DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network , 2016, BMC Medical Research Methodology.

[12]  Stephane Fotso,et al.  Deep Neural Networks for Survival Analysis Based on a Multi-Task Framework , 2018, ArXiv.

[13]  Yoshua Bengio,et al.  Deep Learning for Patient-Specific Kidney Graft Survival Analysis , 2017, ArXiv.

[14]  Max Welling,et al.  Causal Effect Inference with Deep Latent-Variable Models , 2017, NIPS 2017.

[15]  Aleksey Boyko,et al.  Detecting Cancer Metastases on Gigapixel Pathology Images , 2017, ArXiv.

[16]  Diederik P. Kingma Variational inference & deep learning: A new synthesis , 2017 .

[17]  Adler J. Perotte,et al.  Deep Survival Analysis , 2016, MLHC.

[18]  Houjin Chen,et al.  A Survey of Computer-aided Detection of Breast Cancer with Mammography , 2016 .

[19]  Nikos Sidiropoulos,et al.  SinaPlot: an enhanced chart for simple and truthful representation of single observations over multiple classes , 2015, bioRxiv.

[20]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[21]  Allen R. Chen,et al.  Dose-intensive response-based chemotherapy and radiation therapy for children and adolescents with newly diagnosed intermediate-risk hodgkin lymphoma: a report from the Children's Oncology Group Study AHOD0031. , 2014, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[22]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[23]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[24]  James J. Chen,et al.  Assessment of performance of survival prediction models for cancer prognosis , 2012, BMC Medical Research Methodology.

[25]  Russell Greiner,et al.  Learning Patient-Specific Cancer Survival Distributions as a Sequence of Dependent Regressors , 2011, NIPS.

[26]  Denis Larocque,et al.  A review of survival trees , 2011 .

[27]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[28]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[29]  Udaya B. Kogalur,et al.  High-Dimensional Variable Selection for Survival Data , 2010 .

[30]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[31]  Udaya B. Kogalur,et al.  Consistency of Random Survival Forests. , 2008, Statistics & probability letters.

[32]  K. Hornik,et al.  party : A Laboratory for Recursive Partytioning , 2009 .

[33]  ipred : Improved Predictors , 2009 .

[34]  Balaji Krishnapuram,et al.  On Ranking in Survival Analysis: Bounds on the Concordance Index , 2007, NIPS.

[35]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[36]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[37]  David J. Hand,et al.  Classifier Technology and the Illusion of Progress , 2006, math/0606441.

[38]  P. Bühlmann,et al.  Survival ensembles. , 2006, Biostatistics.

[39]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[40]  Torsten Hothorn,et al.  Bagging survival trees , 2002, Statistics in medicine.

[41]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[42]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[43]  Trevor Hastie,et al.  The elements of statistical learning. 2001 , 2001 .

[44]  E Graf,et al.  Assessment and comparison of prognostic classification schemes for survival data. , 1999, Statistics in medicine.

[45]  F. Harrell,et al.  Prognostic/Clinical Prediction Models: Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors , 2005 .

[46]  M LeBlanc,et al.  A review of tree-based prognostic models. , 1995, Cancer treatment and research.

[47]  M. LeBlanc,et al.  Relative risk trees for censored survival data. , 1992, Biometrics.

[48]  D.,et al.  Regression Models and Life-Tables , 2022 .