Exploring Omics data from designed experiments using analysis of variance multiblock Orthogonal Partial Least Squares.

Many experimental factors may have an impact on chemical or biological systems. A thorough investigation of the potential effects and interactions between the factors is made possible by rationally planning the trials using systematic procedures, i.e. design of experiments. However, assessing factors' influences remains often a challenging task when dealing with hundreds to thousands of correlated variables, whereas only a limited number of samples is available. In that context, most of the existing strategies involve the ANOVA-based partitioning of sources of variation and the separate analysis of ANOVA submatrices using multivariate methods, to account for both the intrinsic characteristics of the data and the study design. However, these approaches lack the ability to summarise the data using a single model and remain somewhat limited for detecting and interpreting subtle perturbations hidden in complex Omics datasets. In the present work, a supervised multiblock algorithm based on the Orthogonal Partial Least Squares (OPLS) framework, is proposed for the joint analysis of ANOVA submatrices. This strategy has several advantages: (i) the evaluation of a unique multiblock model accounting for all sources of variation; (ii) the computation of a robust estimator (goodness of fit) for assessing the ANOVA decomposition reliability; (iii) the investigation of an effect-to-residuals ratio to quickly evaluate the relative importance of each effect and (iv) an easy interpretation of the model with appropriate outputs. Case studies from metabolomics and transcriptomics, highlighting the ability of the method to handle Omics data obtained from fixed-effects full factorial designs, are proposed for illustration purposes. Signal variations are easily related to main effects or interaction terms, while relevant biochemical information can be derived from the models.

[1]  Julien Boccard,et al.  A consensus orthogonal partial least squares discriminant analysis (OPLS-DA) strategy for multiblock Omics data fusion. , 2013, Analytica chimica acta.

[2]  Age K. Smilde,et al.  Generic framework for high-dimensional fixed-effects ANOVA , 2012, Briefings Bioinform..

[3]  Melanie Hilario,et al.  Standard machine learning algorithms applied to UPLC-TOF/MS metabolic fingerprinting for the discovery of wound biomarkers in Arabidopsis thaliana , 2010 .

[4]  Pierre R. Bushel,et al.  Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes , 2007, BMC Systems Biology.

[5]  Beata Walczak,et al.  Analysis of variance of designed chromatographic data sets: The analysis of variance-target projection approach. , 2015, Journal of chromatography. A.

[6]  S. R. Searle Linear Models , 1971 .

[7]  Serge Rudaz,et al.  Multivariate data analysis of rapid LC-TOF/MS experiments from Arabidopsis thaliana stressed by wounding , 2007 .

[8]  P. Harrington,et al.  Proteomic analysis of amniotic fluids using analysis of variance-principal component analysis and fuzzy rule-building expert systems applied to matrix-assisted laser desorption/ionization mass spectrometry , 2006 .

[9]  M. Erb,et al.  Family business: multiple members of major phytohormone classes orchestrate plant stress responses. , 2010, Chemistry.

[10]  R. Harshman,et al.  PARAFAC: parallel factor analysis , 1994 .

[11]  L. Buydens,et al.  Regularized MANOVA (rMANOVA) in untargeted metabolomics. , 2015, Analytica chimica acta.

[12]  Johan Trygg,et al.  K-OPLS package: Kernel-based orthogonal projections to latent structures for prediction and interpretation in feature space , 2008, BMC Bioinformatics.

[13]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[14]  M. Hamberg,et al.  Oxo-Phytodienoic Acid-Containing Galactolipids in Arabidopsis: Jasmonate Signaling Dependence1[W][OA] , 2007, Plant Physiology.

[15]  R. Gnanadesikan,et al.  Multivariate Analysis of Variance (MANOVA) , 1962 .

[16]  J. J. Jansen,et al.  ASCA: analysis of multivariate data obtained from an experimental design , 2005 .

[17]  Dominique Bertrand,et al.  Common components and specific weights analysis: A chemometric method for dealing with complexity of food products , 2006 .

[18]  Peter de B. Harrington,et al.  Analysis of variance–principal component analysis: A soft tool for proteomic discovery , 2005 .

[19]  Serge Rudaz,et al.  Harnessing the complexity of metabolomic data with chemometrics , 2014 .

[20]  Age K. Smilde,et al.  ANOVA–principal component analysis and ANOVA–simultaneous component analysis: a comparison , 2011 .

[21]  Serge Rudaz,et al.  Harnessing the complexity of metabolomic data with chemometrics , 2014 .

[22]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[23]  Age K. Smilde,et al.  Statistical validation of megavariate effects in ASCA , 2007, BMC Bioinformatics.

[24]  Svante Wold,et al.  Multivariate analysis of variance (MANOVA) , 1990 .

[25]  M. Rantalainen,et al.  OPLS discriminant analysis: combining the strengths of PLS‐DA and SIMCA classification , 2006 .

[26]  Age K. Smilde,et al.  Improving the analysis of designed studies by combining statistical modelling with study design information , 2009, BMC Bioinformatics.

[27]  Rasmus Bro,et al.  PARAFASCA: ASCA combined with PARAFAC for the analysis of metabolic fingerprinting data , 2008 .

[28]  D. Jouan-Rimbaud Bouveresse,et al.  Identification of significant factors by an extension of ANOVA-PCA based on multi-block analysis , 2011 .

[29]  M. Rantalainen,et al.  Kernel‐based orthogonal projections to latent structures (K‐OPLS) , 2007 .

[30]  Age K. Smilde,et al.  ANOVA-simultaneous component analysis (ASCA): a new tool for analyzing designed metabolomics data , 2005, Bioinform..

[31]  Marti J. Anderson,et al.  Permutation tests for multi-factorial analysis of variance , 2003 .