Statistical strategies for relating metabolomics and proteomics data: a real case study in nutrition research area

The current investigations were carried out in the context of a nutritional case study aiming at assessing the postnatal impact of maternal dietary protein restriction during pregnancy and lactation on rat offspring plasma metabolome and hypothalamic proteome. Although data generated by different “Omics” technologies are usually considered and analyzed separately, their interrelation may offer a valuable opportunity for assessing the emerging ‘integrated biology’ concept. The overall strategy of analysis first investigated data pretreatment and variable selection for each dataset. Then, three multivariate analyses were applied to investigate the links between the abundance of metabolites and the expression of proteins collected on the same samples. Unfold principal component analysis and regularized canonical correlation analysis did not take into account the presence of groups of individuals related to the intervention study. On the contrary, the predictive MultiBlock Partial Least Squares method used this information. Regularized canonical correlation analysis appeared as a relevant approach to investigate of the relationships between the two datasets. However, in order to highlight the molecular compounds, proteins and metabolites, associated in interacting or common metabolic pathways for the experimental groups, MultiBlock partial least squares was the most appropriate method in the present nutritional case study.

[1]  Age K. Smilde,et al.  Assessing the performance of statistical validation tools for megavariate metabolomics data , 2006, Metabolomics.

[2]  M. Malaguarnera,et al.  L-Carnitine supplementation reduces oxidized LDL cholesterol in patients with diabetes. , 2009, The American journal of clinical nutrition.

[3]  A. Tenenhaus,et al.  Regularized Generalized Canonical Correlation Analysis , 2011, Eur. J. Oper. Res..

[4]  Chong-sun Kim Canonical Analysis of Several Sets of Variables , 1973 .

[5]  J. Antignac,et al.  Offspring metabolomic response to maternal protein restriction in a rat model of intrauterine growth restriction (IUGR). , 2011, Journal of proteome research.

[6]  H. Vinod Canonical ridge and econometrics of joint production , 1976 .

[7]  Jonathon Shlens,et al.  A Tutorial on Principal Component Analysis , 2014, ArXiv.

[8]  A. Smilde,et al.  Fusion of mass spectrometry-based metabolomics data. , 2005, Analytical chemistry.

[9]  David M. Allen,et al.  The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .

[10]  P. Morgane,et al.  Effects of prenatal protein malnutrition on the hippocampal formation , 2002, Neuroscience & Biobehavioral Reviews.

[11]  J. Macgregor,et al.  Analysis of multiblock and hierarchical PCA and PLS models , 1998 .

[12]  Philippe Besse,et al.  Highlighting relationships between heterogeneous biological data through graphical displays based on regularized canonical correlation analysis , 2009 .

[13]  S. Wold,et al.  PLS: Partial Least Squares Projections to Latent Structures , 1993 .

[14]  Gene H. Golub,et al.  Generalized cross-validation as a method for choosing a good ridge parameter , 1979, Milestones in Matrix Computation.

[15]  Bo M Jørgensen,et al.  Multivariate data analysis of two-dimensional gel electrophoresis protein patterns from few samples. , 2008, Journal of proteome research.

[16]  P. Horst Generalized canonical correlations and their applications to experimental data. , 1961, Journal of clinical psychology.

[17]  H. Rogniaux,et al.  Postnatal growth velocity modulates alterations of proteins involved in metabolism and neuronal plasticity in neonatal hypothalamus in rats born with intrauterine growth restriction. , 2012, The Journal of nutritional biochemistry.

[18]  Alan Saghatelian,et al.  Global strategies to integrate the proteome and metabolome. , 2005, Current opinion in chemical biology.

[19]  R. Brereton Consequences of sample size, variable selection, and model validation and optimisation, for predicting classification ability from analytical data , 2006 .

[20]  Michel Tenenhaus La r?gression PLS: th?orie et pratique , 1998 .

[21]  R. Henrion N-WAY PRINCIPAL COMPONENT ANALYSIS : THEORY, ALGORITHMS AND APPLICATIONS , 1994 .

[22]  B. Thompson Canonical Correlation Analysis: Uses and Interpretation , 1984 .

[23]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[24]  Jeremy MG Taylor,et al.  Validation of Biomarker-Based Risk Prediction Models , 2008, Clinical Cancer Research.

[25]  Qingbo Xu,et al.  Proteomics and metabolomics combined in cardiovascular research. , 2007, Trends in cardiovascular medicine.

[26]  Mark R. Viant,et al.  Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation , 2007, BMC Bioinformatics.

[27]  L. Barberini,et al.  Metabolomics in newborns with intrauterine growth retardation (IUGR): urine reveals markers of metabolic syndrome , 2011, The journal of maternal-fetal & neonatal medicine : the official journal of the European Association of Perinatal Medicine, the Federation of Asia and Oceania Perinatal Societies, the International Society of Perinatal Obstetricians.

[28]  B. Armelius,et al.  PLS model building: a multivariate approach to personality test data. , 2001, Scandinavian journal of psychology.

[29]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[30]  I. Singh,et al.  Nitric Oxide Regulates Peroxisomal Enzyme Activities , 1995, European journal of clinical chemistry and clinical biochemistry : journal of the Forum of European Clinical Chemistry Societies.

[31]  Estelle Pujos-Guillot,et al.  Development and validation of a UPLC/MS method for a nutritional metabolomic study of human plasma , 2010, Metabolomics.

[32]  Ron Wehrens,et al.  The pls Package: Principal Component and Partial Least Squares Regression in R , 2007 .

[33]  R B Simerly,et al.  Developmental programming of hypothalamic feeding circuits , 2006, Clinical genetics.

[34]  Gavin C. Cawley,et al.  Efficient cross-validation of kernel fisher discriminant classifiers , 2003, ESANN.

[35]  C. Hoebler,et al.  Intrauterine Growth Restriction Alters Postnatal Colonic Barrier Maturation in Rats , 2009, Pediatric Research.

[36]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[37]  Sébastien Lê,et al.  FactoMineR: An R Package for Multivariate Analysis , 2008 .

[38]  L. E. Wangen,et al.  A multiblock partial least squares algorithm for investigating complex chemical systems , 1989 .

[39]  C. Davis,et al.  Frontiers in nutrigenomics, proteomics, metabolomics and cancer prevention. , 2004, Mutation research.

[40]  Z. Zadik Intrauterine Growth Restriction Still a Riddle , 2010, Journal of pediatric endocrinology & metabolism : JPEM.

[41]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[42]  B. Coupé,et al.  Nutritional programming affects hypothalamic organization and early response to leptin. , 2010, Endocrinology.

[43]  E. K. Kemsley,et al.  Multivariate techniques and their application in nutrition: a metabolomics case study , 2007, British Journal of Nutrition.

[44]  R. Wurtman,et al.  4 Aromatic Amino Acids in the Brain , 2007 .

[45]  D. Kelley,et al.  Uteroplacental insufficiency alters hepatic fatty acid-metabolizing enzymes in juvenile and adult rats. , 2001, American journal of physiology. Regulatory, integrative and comparative physiology.

[46]  R. A. van den Berg,et al.  Centering, scaling, and transformations: improving the biological information content of metabolomics data , 2006, BMC Genomics.

[47]  M. Tenenhaus,et al.  Prediction of clinical outcome with microarray data: a partial least squares discriminant analysis (PLS-DA) approach , 2003, Human Genetics.

[48]  Alain Baccini,et al.  CCA: An R Package to Extend Canonical Correlation Analysis , 2008 .