Prediction of the metabolic syndrome status based on dietary and genetic parameters, using Random Forest

Metabolic syndrome (MS) is a cluster of metabolic abnormalities associated with an increased risk of developing cardio-vascular diseases, stroke or type II diabetes. Overall, the aetiology of MS is complex and is determined by the interplay between genetic and environmental factors although it is still difficult to untangle their respective roles. The aim of this study was to determine which factors and/or combination of factors could be predictive of MS status. Using a large case–control study nested in a well-characterized cohort, we investigated genetic and dietary factors collected at entry in subjects having developed MS 7 years later. We used a classification technique called Random Forest to predict the MS status from the analysis of these data. We obtained an overall out-of-bag estimation of the correct classification rate of 71.7% (72.1% for the control subjects and 70.7% for the cases). The plasma concentration of 16.1 was the most discriminative variable, followed by plasma concentration of C18.3(n-6) and C18.2. Three SNPs were selected by Random Forest (APOB rs512535, LTA rs915654 and ACACB rs4766587). These SNPs were also significantly associated to the MS by a univariate Fisher test.