A comparison of machine learning methods for survival analysis of high-dimensional clinical data for dementia prediction

Data collected from clinical trials and cohort studies, such as dementia studies, are often high-dimensional, censored, heterogeneous and contain missing information, presenting challenges to traditional statistical analysis. There is an urgent need for methods that can overcome these challenges to model this complex data. At present there is no cure for dementia and no treatment that can successfully change the course of the disease. Machine learning models that can predict the time until a patient develops dementia are important tools in helping understand dementia risks and can give more accurate results than traditional statistical methods when modelling high-dimensional, heterogeneous, clinical data. This work compares the performance and stability of ten machine learning algorithms, combined with eight feature selection methods, capable of performing survival analysis of high-dimensional, heterogeneous, clinical data. We developed models that predict survival to dementia using baseline data from two different studies. The Sydney Memory and Ageing Study (MAS) is a longitudinal cohort study of 1037 participants, aged 70–90 years, that aims to determine the effects of ageing on cognition. The Alzheimer's Disease Neuroimaging Initiative (ADNI) is a longitudinal study aimed at identifying biomarkers for the early detection and tracking of Alzheimer's disease. Using the concordance index as a measure of performance, our models achieve maximum performance values of 0.82 for MAS and 0.93 For ADNI.

[1]  H. Boshuizen,et al.  Multiple imputation of missing blood pressure covariates in survival analysis. , 1999, Statistics in medicine.

[2]  Chandan K. Reddy,et al.  Machine Learning for Survival Analysis: A Survey , 2017, ArXiv.

[3]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[4]  P. Sachdev,et al.  The Sydney Memory and Ageing Study (MAS): methodology and baseline medical and neuropsychiatric characteristics of an elderly epidemiological non-demented cohort of Australians aged 70–90 years , 2010, International Psychogeriatrics.

[5]  Katherine M Steele,et al.  Design of a 3D-printed, open-source wrist-driven orthosis for individuals with spinal cord injury , 2018, PloS one.

[6]  Amelia A. Assareh,et al.  Alcohol Consumption and Incident Dementia: Evidence from the Sydney Memory and Ageing Study. , 2016, Journal of Alzheimer's disease : JAD.

[7]  Robert Tibshirani,et al.  Survival analysis with high-dimensional covariates , 2010, Statistical methods in medical research.

[8]  Tao Liu,et al.  Automated detection of amnestic mild cognitive impairment in community-dwelling elderly adults: A combined spatial atrophy and white matter alteration approach , 2012, NeuroImage.

[9]  Jiayu Zhou,et al.  Multi-task learning based survival analysis for multi-source block-wise missing data , 2019, Neurocomputing.

[10]  Harry Hemingway,et al.  Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease , 2018, bioRxiv.

[11]  D.,et al.  Regression Models and Life-Tables , 2022 .

[12]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[13]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[14]  Tao Liu,et al.  Predicting the development of mild cognitive impairment: A new use of pattern recognition , 2012, NeuroImage.

[15]  Arcot Sowmya,et al.  Deep Learning Approach for Classification of Mild Cognitive Impairment Subtypes , 2017, ICPRAM.

[16]  NavabNassir,et al.  Survival analysis for high-dimensional, heterogeneous medical data , 2016 .

[17]  Christian Wachinger,et al.  A Wide and Deep Neural Network for Survival Analysis from Anatomical Shape and Tabular Clinical Data , 2019, PKDD/ECML Workshops.

[18]  F. Harrell,et al.  Evaluating the yield of medical tests. , 1982, JAMA.

[19]  Arnoldo Frigessi,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm305 Gene expression Predicting survival from microarray data—a comparative study , 2022 .

[20]  Christos Davatzikos,et al.  A review on neuroimaging-based classification studies and associated feature extraction methods for Alzheimer's disease and its prodromal stages , 2017, NeuroImage.

[21]  Mary Sano,et al.  Preventing Alzheimer’s Disease , 2008, CNS Drugs.

[22]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[23]  Arcot Sowmya,et al.  Classification of Mild Cognitive Impairment Subtypes using Neuropsychological Data , 2016, ICPRAM.

[24]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[25]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[26]  Anne-Laure Boulesteix,et al.  Survival prediction using gene expression data: A review and comparison , 2009, Comput. Stat. Data Anal..

[27]  Michael H. Connors,et al.  Cognition and mortality in older people: the Sydney Memory and Ageing Study. , 2015, Age and ageing.

[28]  G. Pazour,et al.  Ror2 signaling regulates Golgi structure and transport through IFT20 for tumor invasiveness , 2017, Scientific Reports.

[29]  K. Onoda,et al.  Prediction of conversion to Alzheimer’s disease using deep survival analysis of MRI images , 2020, Brain communications.

[30]  M. Schemper,et al.  Estimating the correlation of bivariate failure times under censoring , 2013, Statistics in Medicine.

[31]  Emmanuel Martínez-Ledesma,et al.  Exploring Survival Models Associated with MCI to AD Conversion: A Machine Learning Approach , 2019, bioRxiv.

[32]  H. Westervelt,et al.  Odor identification deficits in frontotemporal dementia: a preliminary study. , 2008, Archives of clinical neuropsychology : the official journal of the National Academy of Neuropsychologists.

[33]  Steffen Löck,et al.  A comparative study of machine learning methods for time-to-event survival data for radiomics risk modelling , 2017, Scientific Reports.

[34]  Nassir Navab,et al.  Survival analysis for high-dimensional, heterogeneous medical data: Exploring feature extraction as an alternative to feature selection , 2016, Artif. Intell. Medicine.

[35]  João Maroco,et al.  Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests , 2011, BMC Research Notes.

[36]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[37]  Bernd Bischl,et al.  mlr: Machine Learning in R , 2016, J. Mach. Learn. Res..

[38]  Mark E. Schmidt,et al.  The Alzheimer's Disease Neuroimaging Initiative: Progress report and future plans , 2010, Alzheimer's & Dementia.

[39]  Melanie Hilario,et al.  Knowledge and Information Systems , 2007 .

[40]  Anne-Laure Boulesteix,et al.  Investigating the prediction ability of survival models based on both clinical and omics data: two case studies , 2014, Statistics in medicine.