ASCA: The Implementation of Design of Experiments Into Multivariate Modelling in Chemometrics

Abstract The untargeted metabolomics paradigm may be very helpful to reveal biochemical patterns in the multifactorial experiments in environmental analysis. Specific combinations between metabolites may be highly specific biomarkers for environmental and/or ecological change. However, such patterns need to be recovered from a background of many unrelated metabolites within a wealth of cooccurring environmental processes. The range of methods we present here, based around Analysis-of-Variance-Simultaneous Component Analysis (ASCA), have been specifically developed retrieve such patterns. They combine the merits of quantitatively describing the Design-of-Experiments that underlies an environmental study with the multivariate nature of metabolomics data. The ASCA toolbox has by now extended into a comprehensive and generic approach that allows analysis of, e.g., unbalanced data, quantitative significance of effects and of relevant biomarkers. The same approach can also be taken to analyse specific effects with respect to positive and negative controls, to reveal specific experimentally relevant deviations in metabolism. We show the ASCA results of a specific plant chemical ecology dataset, in which all glucosinolates within a wild cabbage were profiled upon induction of an ecological defence response. ASCA provides direct insight in the variability associated with different aspects of this response, and relatively recent extensions in data preprocessing reveal very clearly the metabolites that are most relevant to the response.

[1]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[2]  T. Næs,et al.  Confidence ellipsoids for ASCA models based on multivariate regression theory , 2018 .

[3]  Romà Tauler,et al.  Multivariate Curve Resolution (MCR). Solving the mixture analysis problem , 2014 .

[4]  Tormod Næs,et al.  A comparison of methods for analyzing multivariate sensory data in designed experiments - A case study of salt reduction in liver paste , 2014 .

[5]  Peter D. Wentzell,et al.  Interpretation of analysis of variance models using principal component analysis to assess the effect of a maternal anticancer treatment on the mineralization of rat bones. , 2011, Analytica chimica acta.

[6]  J. J. Jansen,et al.  ASCA: analysis of multivariate data obtained from an experimental design , 2005 .

[7]  Bernadette Govaerts,et al.  ASCA+ and APCA+: Extensions of ASCA and APCA in the analysis of unbalanced multifactorial designs , 2017 .

[8]  José Manuel Amigo,et al.  Interval ANOVA simultaneous component analysis (i-ASCA) applied to spectroscopic data to study the effect of fundamental fermentation variables in beer fermentation metabolites , 2017 .

[9]  Age K. Smilde,et al.  Improving the analysis of designed studies by combining statistical modelling with study design information , 2009, BMC Bioinformatics.

[10]  Jean-Luc Wolfender,et al.  Combining ANOVA-PCA with POCHEMON to analyse micro-organism development in a polymicrobial environment. , 2017, Analytica chimica acta.

[11]  Svante Wold,et al.  Multivariate analysis of variance (MANOVA) , 1990 .

[12]  Lutgarde M. C. Buydens,et al.  Interpretation of ANOVA models for microarray data using PCA , 2007, Bioinform..

[13]  Rasmus Bro,et al.  PARAFASCA: ASCA combined with PARAFAC for the analysis of metabolic fingerprinting data , 2008 .

[14]  Age K. Smilde,et al.  Crossfit analysis: a novel method to characterize the dynamics of induced plant responses , 2009, BMC Bioinformatics.

[15]  Angélina El Ghaziri,et al.  AoV-PLS: a new method for the analysis of multivariate data depending on several factors , 2015 .

[16]  Serge Rudaz,et al.  Exploring Omics data from designed experiments using analysis of variance multiblock Orthogonal Partial Least Squares. , 2016, Analytica chimica acta.

[17]  L. Buydens,et al.  Regularized MANOVA (rMANOVA) in untargeted metabolomics. , 2015, Analytica chimica acta.

[18]  Peter de B. Harrington,et al.  Analysis of variance–principal component analysis: A soft tool for proteomic discovery , 2005 .

[19]  Age K Smilde,et al.  Bootstrap confidence intervals in multi-level simultaneous component analysis. , 2009, The British journal of mathematical and statistical psychology.

[20]  T. Ebbels,et al.  Geometric trajectory analysis of metabolic responses to toxicity can define treatment specific profiles. , 2004, Chemical research in toxicology.

[21]  Age K. Smilde,et al.  Discovering gene expression patterns in time course microarray experiments by ANOVA-SCA , 2007, Bioinform..

[22]  S. R. Searle Linear Models , 1971 .

[23]  Henk A. L. Kiers,et al.  Simultaneous Components Analysis , 1992 .

[24]  David S. Wishart,et al.  MetaboAnalyst 3.0—making metabolomics more meaningful , 2015, Nucleic Acids Res..

[25]  R. Bro Multivariate calibration: What is in chemometrics for the analytical chemist? , 2003 .

[26]  L. Delgado-Moreno,et al.  Design of experiments in environmental chemistry studies: example of the extraction of triazines from soil after olive cake amendment. , 2009, Journal of hazardous materials.

[27]  Beata Walczak,et al.  Analysis of variance of designed chromatographic data sets: The analysis of variance-target projection approach. , 2015, Journal of chromatography. A.

[28]  Jianqing Fan,et al.  Sure independence screening in generalized linear models with NP-dimensionality , 2009, The Annals of Statistics.

[29]  Age K. Smilde,et al.  The geometry of ASCA , 2008 .

[30]  Age K Smilde,et al.  Estimating confidence intervals for principal component loadings: a comparison between the bootstrap and asymptotic results. , 2007, The British journal of mathematical and statistical psychology.

[31]  Jan van der Greef,et al.  Symbiosis of chemometrics and metabolomics: past, present, and future , 2005 .

[32]  Age K. Smilde,et al.  ANOVA-simultaneous component analysis (ASCA): a new tool for analyzing designed metabolomics data , 2005, Bioinform..

[33]  F. James Rohlf,et al.  Biometry: The Principles and Practice of Statistics in Biological Research , 1969 .

[34]  Marti J. Anderson,et al.  Permutation tests for multi-factorial analysis of variance , 2003 .

[35]  Age K. Smilde,et al.  Statistical validation of megavariate effects in ASCA , 2007, BMC Bioinformatics.

[36]  Age K. Smilde,et al.  Individual differences in metabolomics: individualised responses and between-metabolite relationships , 2012, Metabolomics.

[37]  Ø. Langsrud,et al.  50–50 multivariate analysis of variance for collinear responses , 2002 .

[38]  Age K. Smilde,et al.  ANOVA–principal component analysis and ANOVA–simultaneous component analysis: a comparison , 2011 .

[39]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[40]  Bennett Daviss,et al.  Growing pains for metabolomics: the newest 'omic science is producing results--and more data than researchers know what to do with , 2005 .

[41]  Lutgarde M. C. Buydens,et al.  An overview of large‐dimensional covariance and precision matrix estimators with applications in chemometrics , 2017 .

[42]  Age K. Smilde,et al.  Generic framework for high-dimensional fixed-effects ANOVA , 2012, Briefings Bioinform..

[43]  Eva Ceulemans,et al.  UvA-DARE ( Digital Academic Repository ) Scaling in ANOVA-simultaneous component analysis , 2015 .