Machine learning algorithms for predicting coronary artery disease: efforts toward an open source solution

The development of Coronary Artery Disease (CAD), one of the most prevalent diseases in the world, is heavily influenced by several modifiable risk factors. Predictive models built using machine learning (ML) algorithms may assist healthcare practitioners in timely detection of CAD, and ultimately, may improve outcomes. In this study, we have applied six different ML algorithms to predict the presence of CAD amongst patients listed in an openly available dataset provided by the University of California Irvine (UCI) Machine Learning Repository, named “the Cleveland dataset.” All six ML algorithms achieved accuracies greater than 80%, with the “Neural Network” algorithm achieving accuracy greater than 93%. The recall achieved with the “Neural Network” model is also highest of the six models (0.93). Additionally, five of the six algorithms resulted in very similar AUC-ROC curves. The AUC-ROC curve corresponding to the “Neural Network” algorithm is slightly steeper implying higher “true positive percentage” achieved with this model. We also extracted the variables of importance in the “Neural Network” model to help in the risk assessment. We have released the full computer code generated in this study in the public domain as a preliminary effort toward developing an open solution for predicting the presence of coronary artery disease in a given population and present a workflow model for implementing a possible solution.

[1]  K. Borgwardt,et al.  Machine Learning in Medicine , 2015, Mach. Learn. under Resour. Constraints Vol. 3.

[2]  M. Drazner,et al.  2013 ACCF/AHA guideline for the management of heart failure: a report of the American College of Cardiology Foundation/American Heart Association Task Force on Practice Guidelines. , 2013, Journal of the American College of Cardiology.

[3]  Saeid Nahavandi,et al.  Machine learning-based coronary artery disease diagnosis: A comprehensive review , 2019, Comput. Biol. Medicine.

[4]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[5]  Z. Obermeyer,et al.  Predicting the Future - Big Data, Machine Learning, and Clinical Medicine. , 2016, The New England journal of medicine.

[6]  Paul Sajda,et al.  Machine learning for detection and diagnosis of disease. , 2006, Annual review of biomedical engineering.

[7]  Jennifer G. Robinson,et al.  2013 ACC/AHA Guideline on the Assessment of Cardiovascular Risk: A Report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines , 2014, Circulation.

[8]  R. Hajar Risk Factors for Coronary Artery Disease: Historical Perspectives , 2017, Heart views : the official journal of the Gulf Heart Association.

[9]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[10]  R. Cuocolo,et al.  Current applications of big data and machine learning in cardiology , 2019, Journal of geriatric cardiology : JGC.

[11]  B. Goldstein,et al.  Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges , 2016, European heart journal.

[12]  Alan D. Lopez,et al.  The Global Burden of Disease Study , 2003 .

[13]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[14]  Georg Heinze,et al.  Variable selection – A review and recommendations for the practicing statistician , 2018, Biometrical journal. Biometrische Zeitschrift.

[15]  J. Kai,et al.  Can machine-learning improve cardiovascular risk prediction using routine clinical data? , 2017, PloS one.

[16]  Ashutosh Kumar Singh,et al.  Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980–2015: a systematic analysis for the Global Burden of Disease Study 2015 , 2016, The Lancet.

[17]  Robert Koprowski,et al.  Machine learning, medical diagnosis, and biomedical engineering research - commentary , 2014, BioMedical Engineering OnLine.

[18]  Ioannis A. Kakadiaris,et al.  Machine Learning Outperforms ACC/AHA CVD Risk Calculator in MESA , 2018, Journal of the American Heart Association.

[19]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[20]  David Tsay,et al.  From Machine Learning to Artificial Intelligence Applications in Cardiac Care. , 2018, Circulation.

[21]  Usman Qamar,et al.  Machine Learning Techniques for Heart Disease Datasets: A Survey , 2019, ICMLC '19.