Joint Analysis of Dependent Features within Compound Spectra Can Improve Detection of Differential Features

Mass spectrometry is an important analytical technology in metabolomics. After the initial feature detection and alignment steps, the raw data processing results in a high-dimensional data matrix of mass spectral features, which is then subjected to further statistical analysis. Univariate tests like Student’s t-test and Analysis of Variances (ANOVA) are hypothesis tests, which aim to detect differences between two or more sample classes, e.g., wildtype-mutant or between different doses of treatments. In both cases, one of the underlying assumptions is the independence between metabolic features. However, in mass spectrometry, a single metabolite usually gives rise to several mass spectral features, which are observed together and show a common behavior. This paper suggests to group the related features of metabolites with CAMERA into compound spectra, and then to use a multivariate statistical method to test whether a compound spectrum (and thus the actual metabolite) is differential between two sample classes. The multivariate method is first demonstrated with an analysis between wild-type and an over-expression line of the model plant Arabidopsis thaliana. For a quantitative evaluation data sets with a simulated known effect between two sample classes were analyzed. The spectra-wise analysis showed better detection results for all simulated effects.

[1]  James R. Kenyon,et al.  Statistical Methods for the Analysis of Repeated Measurements , 2003, Technometrics.

[2]  Douglas B. Kell,et al.  Statistical strategies for avoiding false discoveries in metabolomics and related experiments , 2007, Metabolomics.

[3]  Ivo Grosse,et al.  Experiment design beyond gut feeling: statistical tests and power to detect differential metabolites in mass spectrometry data , 2014, Metabolomics.

[4]  Student Probable Error of a Correlation Coefficient , 1908 .

[5]  Douglas B. Kell,et al.  Automated workflows for accurate mass-based putative metabolite identification in LC/MS-derived metabolomic datasets , 2011, Bioinform..

[6]  Age K. Smilde,et al.  Data-processing strategies for metabolomics studies , 2011 .

[7]  Wei Zheng,et al.  Metabolomics in Epidemiology: Sources of Variability in Metabolite Measurements and Implications , 2013, Cancer Epidemiology, Biomarkers & Prevention.

[8]  D. Kell,et al.  Mass Spectrometry Tools and Metabolite-specific Databases for Molecular Identification in Metabolomics , 2009 .

[9]  Kyung In Kim,et al.  Effects of dependence in high-dimensional multiple testing problems , 2008, BMC Bioinformatics.

[10]  R. Simon,et al.  A transposon‐based activation‐tagging population in Arabidopsis thaliana (TAMARA) and its application in the identification of dominant developmental and metabolic mutations , 2005, FEBS letters.

[11]  Andreas Zell,et al.  Automated Label-free Quantification of Metabolites from Liquid Chromatography–Mass Spectrometry Data* , 2013, Molecular & Cellular Proteomics.

[12]  Rency S Varghese,et al.  Ion annotation-assisted analysis of LC-MS based metabolomic experiment , 2012, Proteome Science.

[13]  D. Scheel,et al.  Resources for Metabolomics , 2011 .

[14]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[15]  Steffen Neumann,et al.  Highly sensitive feature detection for high resolution LC/MS , 2008, BMC Bioinformatics.

[16]  D. Scheel,et al.  The Multifunctional Enzyme CYP71B15 (PHYTOALEXIN DEFICIENT3) Converts Cysteine-Indole-3-Acetonitrile to Camalexin in the Indole-3-Acetonitrile Metabolic Network of Arabidopsis thaliana[W][OA] , 2009, The Plant Cell Online.

[17]  Alexander Erban,et al.  TagFinder for the quantitative analysis of gas chromatography - mass spectrometry (GC-MS)-based metabolite profiling experiments , 2008, Bioinform..

[18]  S. Neumann,et al.  CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. , 2012, Analytical chemistry.

[19]  Christoph Steinbeck,et al.  MetaboLights—an open-access general-purpose repository for metabolomics studies and associated meta-data , 2012, Nucleic Acids Res..

[20]  Age K. Smilde,et al.  Reflections on univariate and multivariate analysis of metabolomics data , 2013, Metabolomics.

[21]  Nigel W. Hardy,et al.  Proposed minimum reporting standards for chemical analysis , 2007, Metabolomics.

[22]  Pierre Legendre,et al.  DISTANCE‐BASED REDUNDANCY ANALYSIS: TESTING MULTISPECIES RESPONSES IN MULTIFACTORIAL ECOLOGICAL EXPERIMENTS , 1999 .

[23]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[24]  Marta Díaz,et al.  AStream: an R package for annotating LC/MS metabolomic data , 2011, Bioinform..

[25]  R. Breitling,et al.  PeakML/mzMatch: a file format, Java library, R library, and tool-chain for mass spectrometry data analysis. , 2011, Analytical chemistry.

[26]  Timothy M. D. Ebbels,et al.  A Statistically Rigorous Test for the Identification of Parent−Fragment Pairs in LC-MS Datasets , 2010, Analytical chemistry.

[27]  G. S. James TESTS OF LINEAR HYPOTHESES IN UNIVERIATE AND MULTIVARIATE ANALYSIS WHEN THE RATIOS OF THE POPULATION VARIANCES ARE UNKNOWN , 1954 .

[28]  Joachim Selbig,et al.  A gentle guide to the analysis of metabolomic data. , 2007, Methods in molecular biology.

[29]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[30]  H. Hotelling The Generalization of Student’s Ratio , 1931 .

[31]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[32]  Student,et al.  THE PROBABLE ERROR OF A MEAN , 1908 .