Prediction of Coronary Artery Calcium Score Using Machine Learning in a Healthy Population

Background: Coronary artery calcium score (CACS) is a reliable predictor for future cardiovascular disease risk. Although deep learning studies using computed tomography (CT) images to predict CACS have been reported, no study has assessed the feasibility of machine learning (ML) algorithms to predict the CACS using clinical variables in a healthy general population. Therefore, we aimed to assess whether ML algorithms other than binary logistic regression (BLR) could predict high CACS in a healthy population with general health examination data. Methods: This retrospective observational study included participants who had regular health screening including coronary CT angiography. High CACS was defined by the Agatston score ≥ 100. Univariable and multivariable BLR was performed to assess predictors for high CACS in the entire dataset. When performing ML prediction for high CACS, the dataset was randomly divided into a training and test dataset with a 7:3 ratio. BLR, catboost, and xgboost algorithms with 5-fold cross-validation and grid search technique were used to find the best performing classifier. Performance comparison of each ML algorithm was evaluated with the area under the receiver operating characteristic (AUROC) curve. Results: A total of 2133 participants were included in the final analysis. Mean age and proportion of male sex were 55.4 ± 11.3 years and 1483 (69.5%), respectively. In multivariable BLR analysis, age (odds ratio [OR], 1.12; 95% confidence interval [CI], 1.10–1.15, p < 0.001), male sex (OR, 2.91; 95% CI, 1.57–5.38, p < 0.001), systolic blood pressure (OR, 1.02; 95% CI, 1.00–1.03, p = 0.019), and low-density lipoprotein cholesterol (OR, 1.00; 95% CI, 0.99–1.00, p = 0.047) were significant predictors for high CACS. Performance in predicting high CACS of xgboost was AUROC of 0.823, followed by catboost (0.750) and BLR (0.585). The comparison of AUROC between xgboost and BLR was significant (p for AUROC comparison < 0.001). Conclusions: Xgboost ML algorithm was found to be a more reliable predictor of CACS in healthy participants compared to the BLR algorithm. ML algorithms may be useful for predicting CACS with only laboratory data in healthy participants.

[1]  G. Diamond,et al.  Analysis of probability as an aid in the clinical diagnosis of coronary-artery disease. , 1979, The New England journal of medicine.

[2]  R. Detrano,et al.  Quantification of coronary artery calcium using ultrafast computed tomography. , 1990, Journal of the American College of Cardiology.

[3]  D. Levy,et al.  Prediction of coronary heart disease using risk factor categories. , 1998, Circulation.

[4]  R. Detrano,et al.  Coronary artery calcium score combined with Framingham score for risk prediction in asymptomatic individuals. , 2004, JAMA.

[5]  S. Blair,et al.  Coronary artery calcium score and coronary heart disease events in a large cohort of asymptomatic men and women. , 2005, American journal of epidemiology.

[6]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[7]  P. Greenland,et al.  Coronary artery calcium score and risk classification for coronary heart disease prediction. , 2010, JAMA.

[8]  Filippo Cademartiri,et al.  Prediction model to estimate presence of coronary artery disease: retrospective pooled analysis of existing cohorts , 2012, BMJ : British Medical Journal.

[9]  Jennifer G. Robinson,et al.  Reprint: 2013 ACC/AHA Guideline on the Treatment of Blood Cholesterol to Reduce Atherosclerotic Cardiovascular Risk in Adults. , 2013, Journal of the American Pharmacists Association : JAPhA.

[10]  D. Berman,et al.  Optimized prognostic score for coronary computed tomographic angiography: results from the CONFIRM registry (COronary CT Angiography EvaluatioN For Clinical Outcomes: An InteRnational Multicenter Registry). , 2013, Journal of the American College of Cardiology.

[11]  Jennifer G. Robinson,et al.  2013 ACC/AHA guideline on the treatment of blood cholesterol to reduce atherosclerotic cardiovascular risk in adults: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. , 2014, Circulation.

[12]  G. Stone,et al.  Coronary artery calcification: pathogenesis and prognostic implications. , 2014, Journal of the American College of Cardiology.

[13]  H. Khan,et al.  Liver enzymes and risk of cardiovascular disease in the general population: a meta-analysis of prospective cohort studies. , 2014, Atherosclerosis.

[14]  Max A. Viergever,et al.  Automatic Coronary Calcium Scoring in Cardiac CT Angiography Using Convolutional Neural Networks , 2015, MICCAI.

[15]  D. Berman,et al.  Incremental prognostic utility of coronary CT angiography for asymptomatic patients based upon extent and severity of coronary artery calcium: results from the COronary CT Angiography EvaluatioN For Clinical Outcomes InteRnational Multicenter (CONFIRM) study. , 2015, European heart journal.

[16]  R. Kronmal,et al.  10-Year Coronary Heart Disease Risk Prediction Using Coronary Artery Calcium and Traditional Risk Factors: Derivation in the MESA (Multi-Ethnic Study of Atherosclerosis) With Validation in the HNR (Heinz Nixdorf Recall) Study and the DHS (Dallas Heart Study). , 2015, Journal of the American College of Cardiology.

[17]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[18]  Zhongheng Zhang,et al.  Model building strategy for logistic regression: purposeful selection. , 2016, Annals of translational medicine.

[19]  Marcela Perrone-Bertolotti,et al.  Machine learning–XGBoost analysis of language networks to classify patients with epilepsy , 2017, Brain Informatics.

[20]  Max A. Viergever,et al.  Automatic Calcium Scoring in Low-Dose Chest CT Using Deep Neural Networks With Dilated Convolutions , 2017, IEEE Transactions on Medical Imaging.

[21]  H. Gransar,et al.  A Comparison of the Updated Diamond-Forrester, CAD Consortium, and CONFIRM History-Based Risk Scores for Predicting Obstructive Coronary Artery Disease in Patients With Stable Chest Pain: The SCOT-HEART Coronary CTA Cohort. , 2019, JACC. Cardiovascular imaging.

[22]  Anna Veronika Dorogush,et al.  CatBoost: unbiased boosting with categorical features , 2017, NeurIPS.

[23]  A. Anadón,et al.  Statins: Adverse reactions, oxidative stress and metabolic interactions , 2019, Pharmacology & therapeutics.

[24]  Jeroen J. Bax,et al.  Machine learning of clinical variables and coronary artery calcium scoring for the prediction of obstructive coronary artery disease on coronary computed tomography angiography: analysis from the CONFIRM registry. , 2019, European heart journal.

[25]  M. Hadamitzky,et al.  Prognostic value of coronary artery calcium score in symptomatic individuals: A meta-analysis of 34,000 subjects. , 2020, International journal of cardiology.

[26]  J. H. Rudd,et al.  Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants , 2019, PloS one.

[27]  J. Y. Takada,et al.  Biochemical markers of muscle damage and high serum concentration of creatine kinase in patients on statin therapy. , 2019, Biomarkers in medicine.

[28]  Sehrish Tabassam,et al.  Deep Learning for Predictive Analytics in Healthcare , 2019, AMLTA.

[29]  Essam Al Daoud,et al.  Comparison between XGBoost, LightGBM and CatBoost Using a Home Credit Dataset , 2019 .

[30]  R. Durazo-Arvizu,et al.  Machine learning to predict cardiovascular risk , 2019, International journal of clinical practice.

[31]  Sergio Escalera,et al.  Analysis of the AutoML Challenge Series 2015-2018 , 2019, Automated Machine Learning.

[32]  G. Patton,et al.  Time Trends in Cardiovascular Disease Mortality Across the BRICS , 2020, Circulation.

[33]  Chen Lei,et al.  Automated Machine Learning , 2021, Cognitive Intelligence and Robotics.