Multi-block PLS discriminant analysis for the joint analysis of metabolomic and epidemiological data

Metabolomics is a powerful phenotyping tool in nutrition and health research, generating complex data that need dedicated treatments to enrich knowledge of biological systems. In particular, to investigate relations between environmental factors, phenotypes and metabolism, discriminant statistical analyses are generally performed separately on metabolomic datasets, complemented by associations with metadata. Another relevant strategy is to simultaneously analyse thematic data blocks by a multi-block partial least squares discriminant analysis (MBPLSDA) allowing determining the importance of variables and blocks in discriminating groups of subjects, taking into account data structure. The present objective was to develop a full open-source standalone tool, allowing all steps of MBPLSDA for the joint analysis of metabolomic and epidemiological data. This tool was based on the mbpls function of the ade4 R package, enriched with functionalities, including some dedicated to discriminant analysis. Provided indicators help to determine the optimal number of components, to check the MBPLSDA model validity, and to evaluate the variability of its parameters and predictions. To illustrate the potential of this tool, MBPLSDA was applied to a real case study involving metabolomics, nutritional and clinical data from a human cohort. The availability of different functionalities in a single R package allowed optimizing parameters for an efficient joint analysis of metabolomics and epidemiological data to obtain new insights into multidimensional phenotypes. In particular, we highlighted the impact of filtering the metabolomic variables beforehand, and the relevance of a MBPLSDA approach in comparison to a standard PLS discriminant analysis method.

[1]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[2]  J. Macgregor,et al.  Analysis of multiblock and hierarchical PCA and PLS models , 1998 .

[3]  Age K. Smilde,et al.  Reflections on univariate and multivariate analysis of metabolomics data , 2013, Metabolomics.

[4]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[5]  Kim-Anh Lê Cao,et al.  DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays , 2019, Bioinform..

[6]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[7]  Stéphanie Bougeard,et al.  Multiblock redundancy analysis: interpretation tools and application in epidemiology , 2011 .

[8]  Florian Rohart,et al.  mixOmics: an R package for ‘omics feature selection and multiple data integration , 2017 .

[9]  Pierrette Gaudreau,et al.  Nutrition as a determinant of successful aging: description of the Quebec longitudinal study Nuage and results from cross-sectional pilot studies. , 2007, Rejuvenation research.

[10]  Age K. Smilde,et al.  UvA-DARE ( Digital Academic Repository ) Assessment of PLSDA cross validation , 2008 .

[11]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[12]  Daniel Jacob,et al.  Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics , 2014, Bioinform..

[13]  Steffen Neumann,et al.  Highly sensitive feature detection for high resolution LC/MS , 2008, BMC Bioinformatics.

[14]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[15]  Kim-Anh Lê Cao,et al.  DIABLO - an integrative, multi-omics, multivariate method for multi-group classification , 2017 .

[16]  T NgRaymond,et al.  Novel Multivariate Methods for Integration of Genomics and Proteomics Data: Applications in a Kidney Transplant Rejection Study , 2014 .

[17]  Blandine Comte,et al.  Systems Metabolomics for Prediction of Metabolic Syndrome. , 2017, Journal of proteome research.

[18]  David I. Ellis,et al.  A tutorial review: Metabolomics and partial least squares-discriminant analysis--a marriage of convenience or a shotgun wedding. , 2015, Analytica chimica acta.

[19]  Florian Rohart,et al.  DIABLO: from multi-omics assays to biomarker discovery, an integrative approach , 2018, bioRxiv.

[20]  Rawi Ramautar,et al.  Human metabolomics: strategies to understand biology. , 2013, Current opinion in chemical biology.

[21]  Anne-Béatrice Dufour,et al.  The ade4 Package: Implementing the Duality Diagram for Ecologists , 2007 .

[22]  Gilbert Saporta,et al.  Probabilités, Analyse des données et statistique , 1991 .

[23]  John E Hale,et al.  The role of mass spectrometry in biomarker discovery and measurement. , 2006, Current drug metabolism.

[24]  Ralf Steuer,et al.  Review: On the analysis and interpretation of correlations in metabolomic data , 2006, Briefings Bioinform..

[25]  S. Dray,et al.  Supervised Multiblock Analysis in R with the ade4 Package , 2018 .

[26]  Kim-Anh Lê Cao,et al.  Novel multivariate methods for integration of genomics and proteomics data: applications in a kidney transplant rejection study. , 2014, Omics : a journal of integrative biology.