Combinatorial Use of Machine Learning and Logistic Regression for Predicting Carotid Plaque Risk Among 5.4 Million Adults With Fatty Liver Disease Receiving Health Check-Ups: Population-Based Cross-Sectional Study

Background Carotid plaque can progress into stroke, myocardial infarction, etc, which are major global causes of death. Evidence shows a significant increase in carotid plaque incidence among patients with fatty liver disease. However, unlike the high detection rate of fatty liver disease, screening for carotid plaque in the asymptomatic population is not yet prevalent due to cost-effectiveness reasons, resulting in a large number of patients with undetected carotid plaques, especially among those with fatty liver disease. Objective This study aimed to combine the advantages of machine learning (ML) and logistic regression to develop a straightforward prediction model among the population with fatty liver disease to identify individuals at risk of carotid plaque. Methods Our study included 5,420,640 participants with fatty liver from Meinian Health Care Center. We used random forest, elastic net (EN), and extreme gradient boosting ML algorithms to select important features from potential predictors. Features acknowledged by all 3 models were enrolled in logistic regression analysis to develop a carotid plaque prediction model. Model performance was evaluated based on the area under the receiver operating characteristic curve, calibration curve, Brier score, and decision curve analysis both in a randomly split internal validation data set, and an external validation data set comprising 32,682 participants from MJ Health Check-up Center. Risk cutoff points for carotid plaque were determined based on the Youden index, predicted probability distribution, and prevalence rate of the internal validation data set to classify participants into high-, intermediate-, and low-risk groups. This risk classification was further validated in the external validation data set. Results Among the participants, 26.23% (1,421,970/5,420,640) were diagnosed with carotid plaque in the development data set, and 21.64% (7074/32,682) were diagnosed in the external validation data set. A total of 6 features, including age, systolic blood pressure, low-density lipoprotein cholesterol (LDL-C), total cholesterol, fasting blood glucose, and hepatic steatosis index (HSI) were collectively selected by all 3 ML models out of 27 predictors. After eliminating the issue of collinearity between features, the logistic regression model established with the 5 independent predictors reached an area under the curve of 0.831 in the internal validation data set and 0.801 in the external validation data set, and showed good calibration capability graphically. Its predictive performance was comprehensively competitive compared with the single use of either logistic regression or ML algorithms. Optimal predicted probability cutoff points of 25% and 65% were determined for classifying individuals into low-, intermediate-, and high-risk categories for carotid plaque. Conclusions The combination of ML and logistic regression yielded a practical carotid plaque prediction model, and was of great public health implications in the early identification and risk assessment of carotid plaque among individuals with fatty liver.

[1]  Gunther Schauberger,et al.  A tree‐based modeling approach for matched case‐control studies , 2023, Statistics in medicine.

[2]  J. Ren,et al.  Deep learning based on carotid transverse B-mode scan videos for the diagnosis of carotid plaque: a prospective multicenter study , 2022, European Radiology.

[3]  Haozhe Li,et al.  CPTV: Classification by tracking of carotid plaque in ultrasound videos , 2022, Comput. Medical Imaging Graph..

[4]  A. Elsaid,et al.  Machine learning approach for hemorrhagic transformation prediction: Capturing predictors' interaction , 2022, Frontiers in Neurology.

[5]  Jianming Wang,et al.  Association between the triglyceride-glucose index and carotid plaque incidence: a longitudinal study , 2022, Cardiovascular Diabetology.

[6]  N. Sun,et al.  Machine learning outperforms traditional logistic regression and offers new possibilities for cardiovascular risk prediction: A study involving 143,043 Chinese patients with hypertension , 2022, Frontiers in Cardiovascular Medicine.

[7]  Yufeng Wen,et al.  Prediction of carotid plaque by blood biochemical indices and related factors based on Fisher discriminant analysis , 2022, BMC Cardiovascular Disorders.

[8]  S. Seo,et al.  Prediction of conversion to dementia using interpretable machine learning in patients with amnestic mild cognitive impairment , 2022, Frontiers in Aging Neuroscience.

[9]  M. Scuruchi,et al.  Fatty Liver as Potential Biomarker of Atherosclerotic Damage in Familial Combined Hyperlipidemia , 2022, Biomedicines.

[10]  M. Budoff,et al.  Impact of Blood Lipids on 10-Year Cardiovascular Risk in Individuals Without Dyslipidemia and With Low Risk Factor Burden. , 2022, Mayo Clinic proceedings.

[11]  Dan Wu,et al.  An accurate and explainable ensemble learning method for carotid plaque prediction in an asymptomatic population , 2022, Comput. Methods Programs Biomed..

[12]  C. Chesneau,et al.  Development of Nonlaboratory-Based Risk Prediction Models for Cardiovascular Diseases Using Conventional and Machine Learning Approaches , 2021, International journal of environmental research and public health.

[13]  W. Abhayaratna,et al.  Association of traditional risk factors with carotid intima-media thickness and carotid plaque in asymptomatic individuals with a family history of premature cardiovascular disease , 2021, The International Journal of Cardiovascular Imaging.

[14]  Guoyue Yuan,et al.  Association of Hepatic Steatosis Index and Fatty Liver Index with Carotid Atherosclerosis in Type 2 Diabetes , 2021, International journal of medical sciences.

[15]  Meizi Cui,et al.  Triglyceride–glucose index and the incidence of atherosclerotic cardiovascular diseases: a meta-analysis of cohort studies , 2021, Cardiovascular Diabetology.

[16]  Xingquan Zhao,et al.  Association between cumulative exposure to different lipid parameters and risk of newly developed carotid plaque , 2021, Stroke and vascular neurology.

[17]  Sotiris Kotsiantis,et al.  Explainable AI: A Review of Machine Learning Interpretability Methods , 2020, Entropy.

[18]  T. Naqvi,et al.  Recommendations for the Assessment of Carotid Arterial Plaque by Ultrasound for the Characterization of Atherosclerosis and Evaluation of Cardiovascular Risk: From the American Society of Echocardiography. , 2020, Journal of the American Society of Echocardiography : official publication of the American Society of Echocardiography.

[19]  Chen Li,et al.  Early identification of carotid vulnerable plaque in asymptomatic patients , 2020, BMC Cardiovascular Disorders.

[20]  Q. Cui,et al.  Association between Nonalcoholic Fatty Liver Disease and Carotid Artery Disease in a Community-Based Chinese Population: A Cross-Sectional Study , 2018, Chinese medical journal.

[21]  Yajie Zhu,et al.  Prevalence of carotid atherosclerosis and carotid plaque in Chinese adults: A systematic review and meta-regression analysis. , 2018, Atherosclerosis.

[22]  I. Kohane,et al.  Big Data and Machine Learning in Health Care. , 2018, JAMA.

[23]  R. Collins,et al.  Burden of carotid artery atherosclerosis in Chinese adults: Implications for future risk of cardiovascular diseases , 2017, European journal of preventive cardiology.

[24]  Yanan Wu,et al.  Sex Differences in Prevalence of and Risk Factors for Carotid Plaque among Adults: A Population-based Cross-Sectional Study in Rural China , 2016, Scientific Reports.

[25]  O. Franco,et al.  Determinants of carotid atherosclerotic plaque burden in a stroke-free population. , 2016, Atherosclerosis.

[26]  Vilmundur Gudnason,et al.  Prevalence and determinants of carotid plaque in the cross-sectional REFINE-Reykjavik study , 2016, BMJ Open.

[27]  B. Goldstein,et al.  Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges , 2016, European heart journal.

[28]  Jiang Lixin,et al.  Outline of the report on cardiovascular diseases in China, 2014. , 2016, European heart journal supplements : journal of the European Society of Cardiology.

[29]  R. Sacco,et al.  Cigarette Smoking and Carotid Plaque Echodensity in the Northern Manhattan Study , 2015, Cerebrovascular Diseases.

[30]  Ewout W Steyerberg,et al.  Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints , 2014, BMC Medical Research Methodology.

[31]  L. Lind Flow-mediated vasodilation was found to be an independent predictor of changes in the carotid plaque status during a 5-year follow-up. , 2014, Journal of atherosclerosis and thrombosis.

[32]  Carlo Palombo,et al.  Fatty liver index, gamma‐glutamyltransferase, and early carotid plaques , 2012, Hepatology.

[33]  H. Kim,et al.  Hepatic steatosis index: a simple screening tool reflecting nonalcoholic fatty liver disease. , 2010, Digestive and liver disease : official journal of the Italian Society of Gastroenterology and the Italian Association for the Study of the Liver.

[34]  Christopher B. Kendall,et al.  Use of carotid ultrasound to identify subclinical vascular disease and evaluate cardiovascular disease risk: a consensus statement from the American Society of Echocardiography Carotid Intima-Media Thickness Task Force. Endorsed by the Society for Vascular Medicine. , 2008, Journal of the American Society of Echocardiography : official publication of the American Society of Echocardiography.

[35]  W. Kannel,et al.  A general cardiovascular risk profile: the Framingham Study. , 1976, The American journal of cardiology.