Machine learning for prediction of all-cause mortality in patients with suspected coronary artery disease: a 5-year multicentre prospective registry analysis

Aims Traditional prognostic risk assessment in patients undergoing non-invasive imaging is based upon a limited selection of clinical and imaging findings. Machine learning (ML) can consider a greater number and complexity of variables. Therefore, we investigated the feasibility and accuracy of ML to predict 5-year all-cause mortality (ACM) in patients undergoing coronary computed tomographic angiography (CCTA), and compared the performance to existing clinical or CCTA metrics. Methods and results The analysis included 10 030 patients with suspected coronary artery disease and 5-year follow-up from the COronary CT Angiography EvaluatioN For Clinical Outcomes: An InteRnational Multicenter registry. All patients underwent CCTA as their standard of care. Twenty-five clinical and 44 CCTA parameters were evaluated, including segment stenosis score (SSS), segment involvement score (SIS), modified Duke index (DI), number of segments with non-calcified, mixed or calcified plaques, age, sex, gender, standard cardiovascular risk factors, and Framingham risk score (FRS). Machine learning involved automated feature selection by information gain ranking, model building with a boosted ensemble algorithm, and 10-fold stratified cross-validation. Seven hundred and forty-five patients died during 5-year follow-up. Machine learning exhibited a higher area-under-curve compared with the FRS or CCTA severity scores alone (SSS, SIS, DI) for predicting all-cause mortality (ML: 0.79 vs. FRS: 0.61, SSS: 0.64, SIS: 0.64, DI: 0.62; P< 0.001). Conclusions Machine learning combining clinical and CCTA data was found to predict 5-year ACM significantly better than existing clinical or CCTA metrics alone.

[1]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[2]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[3]  M Greiner,et al.  A modified ROC analysis for the selection of cut-off values and the definition of intermediate results of serodiagnostic tests. , 1995, Journal of immunological methods.

[4]  D. Levy,et al.  Prediction of coronary heart disease using risk factor categories. , 1998, Circulation.

[5]  D. Berman,et al.  Incremental prognostic value of myocardial perfusion single photon emission computed tomography in patients with diabetes mellitus. , 1999, American heart journal.

[6]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[7]  Thomas G. Dietterich Ensemble Methods in Machine Learning , 2000, Multiple Classifier Systems.

[8]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[9]  Ian Witten,et al.  Data Mining , 2000 .

[10]  E Mjolsness,et al.  Machine learning for science: state of the art and future prospects. , 2001, Science.

[11]  A. Dyer,et al.  Major Risk Factors as Antecedents of Fatal and Nonfatal Coronary Heart Disease Events , 2003 .

[12]  A. Dyer,et al.  Major risk factors as antecedents of fatal and nonfatal coronary heart disease events. , 2003, JAMA.

[13]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[14]  Annette M. Molinaro,et al.  Prediction error estimation: a comparison of resampling methods , 2005, Bioinform..

[15]  Takafumi Kanamori,et al.  Robust Loss Functions for Boosting , 2007, Neural Computation.

[16]  Daniel S Berman,et al.  Long-term prognosis associated with coronary calcification: observations from a registry of 25,253 patients. , 2007, Journal of the American College of Cardiology.

[17]  Patrick W Serruys,et al.  Comprehensive assessment of coronary artery stenoses: computed tomography coronary angiography versus conventional coronary angiography and correlation with fractional flow reserve in patients with stable angina. , 2008, Journal of the American College of Cardiology.

[18]  M. Budoff,et al.  Diagnostic performance of 64-multidetector row coronary computed tomographic angiography for evaluation of coronary artery stenosis in individuals without known coronary artery disease: results from the prospective multicenter ACCURACY (Assessment by Coronary Computed Tomographic Angiography of Indi , 2008, Journal of the American College of Cardiology.

[19]  Jörg Hausleiter,et al.  Prognostic value of coronary computed tomographic angiography for prediction of cardiac events in patients with suspected coronary artery disease. , 2009, JACC. Cardiovascular imaging.

[20]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[21]  Harald Binder,et al.  Boosting for high-dimensional time-to-event data with competing risks , 2009, Bioinform..

[22]  Sanjay Kaul,et al.  Low diagnostic yield of elective coronary angiography. , 2010, The New England journal of medicine.

[23]  Akbar K Waljee,et al.  Machine Learning in Medicine: A Primer for Physicians , 2010, The American Journal of Gastroenterology.

[24]  S. Abbara SCCT guidelines for performance of coronary computed tomographic angiography: A report of the Society of Cardiovascular Computed Tomography Guidelines Committee , 2010 .

[25]  Ian H. Witten,et al.  Data Mining: Practical Machine Learning Tools and Techniques, 3/E , 2014 .

[26]  Yeung Yam,et al.  Incremental Prognostic Value of Cardiac Computed Tomography in Coronary Artery Disease Using CONFIRM: COroNary Computed Tomography Angiography Evaluation for Clinical Outcomes: An InteRnational Multicenter Registry , 2011, Circulation. Cardiovascular imaging.

[27]  M. Pencina,et al.  Rationale and design of the CONFIRM (COronary CT Angiography EvaluatioN For Clinical Outcomes: An InteRnational Multicenter) Registry. , 2011, Journal of cardiovascular computed tomography.

[28]  Ian H. Witten,et al.  Chapter 1 – What's It All About? , 2011 .

[29]  Usama Bilal,et al.  Challenges and opportunities for cardiovascular disease prevention. , 2011, The American journal of medicine.

[30]  D. Berman,et al.  Optimized prognostic score for coronary computed tomographic angiography: results from the CONFIRM registry (COronary CT Angiography EvaluatioN For Clinical Outcomes: An InteRnational Multicenter Registry). , 2013, Journal of the American College of Cardiology.

[31]  Daniel S. Berman,et al.  Prediction of revascularization after myocardial perfusion SPECT by machine learning in a large population , 2015, Journal of Nuclear Cardiology.

[32]  Michiel L Bots,et al.  Comparison of the Framingham Risk Score, SCORE and WHO/ISH cardiovascular risk prediction models in an Asian population. , 2014, International journal of cardiology.

[33]  V. Fuster,et al.  The myth of the "vulnerable plaque": transitioning from a focus on individual lesions to atherosclerotic disease burden for coronary artery disease risk assessment. , 2015, Journal of the American College of Cardiology.

[34]  K. Borgwardt,et al.  Machine Learning in Medicine , 2015, Mach. Learn. under Resour. Constraints Vol. 3.