Predicting Metabolic Syndrome With Machine Learning Models Using a Decision Tree Algorithm: Retrospective Cohort Study (Preprint)

BACKGROUND Metabolic syndrome is a cluster of disorders that significantly influence the development and deterioration of numerous diseases. FibroScan is an ultrasound device that was recently shown to predict metabolic syndrome with moderate accuracy. However, previous research regarding prediction of metabolic syndrome in subjects examined with FibroScan has been mainly based on conventional statistical models. Alternatively, machine learning, whereby a computer algorithm learns from prior experience, has better predictive performance over conventional statistical modeling. OBJECTIVE We aimed to evaluate the accuracy of different decision tree machine learning algorithms to predict the state of metabolic syndrome in self-paid health examination subjects who were examined with FibroScan. METHODS Multivariate logistic regression was conducted for every known risk factor of metabolic syndrome. Principal components analysis was used to visualize the distribution of metabolic syndrome patients. We further applied various statistical machine learning techniques to visualize and investigate the pattern and relationship between metabolic syndrome and several risk variables. RESULTS Obesity, serum glutamic-oxalocetic transaminase, serum glutamic pyruvic transaminase, controlled attenuation parameter score, and glycated hemoglobin emerged as significant risk factors in multivariate logistic regression. The area under the receiver operating characteristic curve values for classification and regression trees and for the random forest were 0.831 and 0.904, respectively. CONCLUSIONS Machine learning technology facilitates the identification of metabolic syndrome in self-paid health examination subjects with high accuracy.

[1]  R. Kelishadi,et al.  Temporal Trend of Non-Invasive Method Capacity for Early Detection of Metabolic Syndrome in Children and Adolescents: A Bayesian Multilevel Analysis of Pseudo-Panel Data , 2019, Annals of Nutrition and Metabolism.

[2]  R. Mattiello,et al.  Performance of Anthropometric Indicators in the Prediction of Metabolic Syndrome in the Elderly. , 2019, Metabolic syndrome and related disorders.

[3]  V. Paradis,et al.  Endothelial fatty liver binding protein 4: a new targetable mediator in hepatocellular carcinoma related to metabolic syndrome , 2018, Oncogene.

[4]  V. Ganesan,et al.  Exercise, diet and educational interventions for metabolic syndrome in persons with schizophrenia: A systematic review. , 2018, Asian journal of psychiatry.

[5]  G. Aithal,et al.  Risk-stratified screening for chronic liver disease using vibration-controlled transient elastography (Fibroscan) , 2018 .

[6]  Xiaoyun Tang,et al.  Prediction of the development of metabolic syndrome by the Markov model based on a longitudinal study in Dalian City , 2018, BMC Public Health.

[7]  M. Correa-Rodríguez,et al.  Tri-Ponderal Mass Index vs. Fat Mass/Height3 as a Screening Tool for Metabolic Syndrome Prediction in Colombian Children and Young People , 2018, Nutrients.

[8]  Seyed Abolghasem Mirroshandel,et al.  A novel method for predicting kidney stone type using ensemble learning , 2017, Artif. Intell. Medicine.

[9]  E. González-Jiménez,et al.  Percentage of Body Fat and Fat Mass Index as a Screening Tool for Metabolic Syndrome Prediction in Colombian University Students , 2017, Nutrients.

[10]  Kun Gao,et al.  Predicting risk for portal vein thrombosis in acute pancreatitis patients: A comparison of radical basis function artificial neural network and logistic regression models , 2017, Journal of critical care.

[11]  Qing Zhang,et al.  Association between liver function and metabolic syndrome in Chinese men and women , 2017, Scientific Reports.

[12]  A. Quaas,et al.  Transient elastography in autoimmune hepatitis: Timing determines the impact of inflammation and fibrosis. , 2016, Journal of hepatology.

[13]  J. Myers,et al.  Comparison of adiposity indices and cut-off values in the prediction of metabolic syndrome in postmenopausal women. , 2016, Diabetes & metabolic syndrome.

[14]  Milad Moradi,et al.  Different approaches for identifying important concepts in probabilistic biomedical text summarization , 2016, Artif. Intell. Medicine.

[15]  G. Goh,et al.  Clinical applications, limitations and future role of transient elastography in the management of liver disease. , 2016, World journal of gastrointestinal pharmacology and therapeutics.

[16]  F. Sun,et al.  Systematic review with meta‐analysis: the diagnostic accuracy of transient elastography for the staging of liver fibrosis in patients with chronic hepatitis B , 2016, Alimentary pharmacology & therapeutics.

[17]  M. Ziol,et al.  Diagnostic performance of controlled attenuation parameter for predicting steatosis grade in chronic hepatitis B. , 2015, Annals of hepatology.

[18]  Neil J Stone,et al.  Lifestyle modification for metabolic syndrome: a systematic review. , 2014, The American journal of medicine.

[19]  Ling-I Chen,et al.  Modification of Diet in Renal Disease (MDRD) Study and CKD Epidemiology Collaboration (CKD-EPI) Equations for Taiwanese Adults , 2014, PloS one.

[20]  V. de Lédinghen,et al.  Controlled attenuation parameter (CAP) for the diagnosis of steatosis: a prospective study of 5323 examinations. , 2014, Journal of hepatology.

[21]  Galit Shmueli,et al.  Research Commentary - Too Big to Fail: Large Samples and the p-Value Problem , 2013, Inf. Syst. Res..

[22]  Divya Tomar,et al.  A survey on Data Mining approaches for Healthcare , 2013, BSBT 2013.

[23]  Publisher Bioinfo Publications Journal of Machine Learning Technologies , 2013 .

[24]  G. Wong,et al.  Transient elastography: Kill two birds with one stone? , 2013, World journal of hepatology.

[25]  V. de Lédinghen,et al.  Determination of reliability criteria for liver stiffness evaluation by transient elastography , 2013, Hepatology.

[26]  Emmanuel Ifeachor,et al.  Early detection and characterization of Alzheimer's disease in clinical scenarios using Bioprofile concepts and K-means , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[27]  S. Czernichow,et al.  Transient elastography as a screening tool for liver fibrosis and cirrhosis in a community-based population aged over 45 years , 2010, Gut.

[28]  Abdel-Badeeh M. Salem,et al.  Clustering-based approach for detecting breast cancer recurrence , 2010, 2010 10th International Conference on Intelligent Systems Design and Applications.

[29]  Marek J Druzdzel,et al.  Development of a hybrid decision support model for optimal ventricular assist device weaning. , 2010, The Annals of thoracic surgery.

[30]  Louis M Bell,et al.  Electronic Health Record–Based Decision Support to Improve Asthma Care: A Cluster-Randomized Trial , 2010, Pediatrics.

[31]  Sandhya Joshi,et al.  Classification of Alzheimer's Disease and Parkinson's Disease by Using Machine Learning and Neural Network Methods , 2010, 2010 Second International Conference on Machine Learning and Computing.

[32]  Paul L. Huang A comprehensive definition for metabolic syndrome , 2009, Disease Models & Mechanisms.

[33]  Santosh S. Vempala,et al.  Algorithmic Prediction of Health-Care Costs , 2008, Oper. Res..

[34]  F. Oberti,et al.  Reproducibility of liver stiffness measurement by ultrasonographic elastometry. , 2008, Clinical gastroenterology and hepatology : the official clinical practice journal of the American Gastroenterological Association.

[35]  J. Vergniol,et al.  Transient elastography (FibroScan). , 2008, Gastroentérologie Clinique et Biologique.

[36]  K. Hornik,et al.  Model-Based Recursive Partitioning , 2008 .

[37]  Yuichi Harano,et al.  The Severity of Ultrasonographic Findings in Nonalcoholic Fatty Liver Disease Reflects the Metabolic Syndrome and Visceral Fat Accumulation , 2007, The American Journal of Gastroenterology.

[38]  Dario Conte,et al.  Reproducibility of transient elastography in the evaluation of liver fibrosis in patients with chronic liver disease , 2007, Gut.

[39]  Mattias Ohlsson,et al.  Risk factor identification and mortality prediction in cardiac surgery using artificial neural networks. , 2006, The Journal of thoracic and cardiovascular surgery.

[40]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[41]  J. Shaw,et al.  Metabolic syndrome—a new world‐wide definition. A Consensus Statement from the International Diabetes Federation , 2006, Diabetic medicine : a journal of the British Diabetic Association.

[42]  K. Mohammad,et al.  Comparison of artificial neural network and logistic regression models for prediction of mortality in head trauma based on initial clinical data , 2005, BMC Medical Informatics Decis. Mak..

[43]  P. V. von Hippel Biases in SPSS 12.0 Missing Value Analysis , 2004 .

[44]  W. Baxt Use of an artificial neural network for the diagnosis of myocardial infarction. , 1991, Annals of internal medicine.

[45]  Roger G. Mark,et al.  Detection of atrial fibrillation using artificial neural networks , 1991, [1991] Proceedings Computers in Cardiology.

[46]  J. Ross Quinlan,et al.  Simplifying decision trees , 1987, Int. J. Hum. Comput. Stud..

[47]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[48]  Oguzhan Alagoz,et al.  Informatics in radiology: comparison of logistic regression and artificial neural network models in breast cancer risk estimation. , 2010, Radiographics : a review publication of the Radiological Society of North America, Inc.

[49]  P. J. Lisboa,et al.  Invited Article , 2001 .

[50]  L. Breiman Random Forests , 2001, Machine Learning.

[51]  Marc V. Lenz,et al.  For a list of recent papers see the backpages of this paper. evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R , 2022 .