Identifying Metabotypes From Complex Biological Data Using PARAFAC

Research have identified large individual variation in physiological response to diet, which has led to more focused investigations in precision nutrition. One approach towards personalized nutrition is to identify groups of differential responders, so called metabotypes (i.e., clusters of individuals with similar metabolic profiles and/or regulation). Metabotyping has previously been addressed using matrix decomposition tools like principal component analysis (PCA) on data organized in matrix form. However, metabotyping using data from more complex experimental designs, involving e.g., repeated measures over time or multiple treatments (tensor data), requires new methods. We developed a workflow for detecting metabotypes from experimental tensor data. The workflow is based on tensor decomposition, specifically PARAFAC which is conceptually similar to PCA but extended to multidimensional data. Metabotypes, based on metabolomics data were identified from PARAFAC scores using k-means clustering and validated by their association to anthropometric and clinical baseline data. Additionally, we evaluated the robustness of the metabotypes using bootstrapping. Furthermore, we applied the workflow to identify metabotypes using data from a crossover acute post-prandial dietary intervention study on 17 overweight males (BMI 25–30 kg/m2, 41–67 y of age) undergoing three dietary interventions (pickled herring, baked herring and baked beef), measuring 80 metabolites (from GC-MS metabolomics) at 8 time points (0–7h).  We identified two metabotypes characterized by differences in amino acid levels, predominantly in the beef diet, that were also associated with creatinine (p = 0.007). The metabotype with higher postprandial amino acid levels was also associated with higher fasting creatinine compared to the other metabotype. The results stress the potential of PARAFAC to discover metabotypes from complex study designs. The workflow is not restricted to our data structure and can be applied to any type of tensor data. However, PARAFAC is sensitive to data pre-processing and further studies where differential metabotypes are related to clinical endpoints are highly warranted. This work has been supported by the Swedish Foundation for Strategic Research and Formas, which is gratefully acknowledged.