Galaxy-M: a Galaxy workflow for processing and analyzing direct infusion and liquid chromatography mass spectrometry-based metabolomics data

BackgroundMetabolomics is increasingly recognized as an invaluable tool in the biological, medical and environmental sciences yet lags behind the methodological maturity of other omics fields. To achieve its full potential, including the integration of multiple omics modalities, the accessibility, standardization and reproducibility of computational metabolomics tools must be improved significantly.ResultsHere we present our end-to-end mass spectrometry metabolomics workflow in the widely used platform, Galaxy. Named Galaxy-M, our workflow has been developed for both direct infusion mass spectrometry (DIMS) and liquid chromatography mass spectrometry (LC-MS) metabolomics. The range of tools presented spans from processing of raw data, e.g. peak picking and alignment, through data cleansing, e.g. missing value imputation, to preparation for statistical analysis, e.g. normalization and scaling, and principal components analysis (PCA) with associated statistical evaluation. We demonstrate the ease of using these Galaxy workflows via the analysis of DIMS and LC-MS datasets, and provide PCA scores and associated statistics to help other users to ensure that they can accurately repeat the processing and analysis of these two datasets. Galaxy and data are all provided pre-installed in a virtual machine (VM) that can be downloaded from the GigaDB repository. Additionally, source code, executables and installation instructions are available from GitHub.ConclusionsThe Galaxy platform has enabled us to produce an easily accessible and reproducible computational metabolomics workflow. More tools could be added by the community to expand its functionality. We recommend that Galaxy-M workflow files are included within the supplementary information of publications, enabling metabolomics studies to achieve greater reproducibility.

[1]  Daniel Jacob,et al.  Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics , 2014, Bioinform..

[2]  Mark R. Viant,et al.  Variance stabilising transformations for NMR metabolomics data , 2007, BMC Systems Biology.

[3]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[4]  Theodoros N. Arvanitis,et al.  A signal filtering method for improved quantification and noise discrimination in fourier transform ion cyclotron resonance mass spectrometry-based metabolomics data , 2009, Journal of the American Society for Mass Spectrometry.

[5]  Ulf Leser,et al.  Parallelization in Scientific Workflow Management Systems , 2013, ArXiv.

[6]  Christoph Steinbeck,et al.  MetaboLights—an open-access general-purpose repository for metabolomics studies and associated meta-data , 2012, Nucleic Acids Res..

[7]  Theodoros N. Arvanitis,et al.  Dynamic range and mass accuracy of wide-scan direct infusion nanoelectrospray fourier transform ion cyclotron resonance mass spectrometry-based metabolomics increased by the spectral stitching method. , 2007, Analytical chemistry.

[8]  Ralf J. M. Weber,et al.  Characterization of isotopic abundance measurements in high resolution FT-ICR and Orbitrap mass spectra for improved confidence of metabolite identification. , 2011, Analytical chemistry.

[9]  J. Lindon,et al.  Metabonomics: a platform for studying drug toxicity and gene function , 2002, Nature Reviews Drug Discovery.

[10]  James E. Johnson,et al.  Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations , 2014, BMC Genomics.

[11]  Jonathan Crabtree,et al.  Ergatis: a web interface and scalable software system for bioinformatics workflows , 2010, Bioinform..

[12]  T. Spector,et al.  Omics technologies and the study of human ageing , 2013, Nature Reviews Genetics.

[13]  Carole A. Goble,et al.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud , 2013, Nucleic Acids Res..

[14]  Anton Nekrutenko,et al.  Dissemination of scientific software with Galaxy ToolShed , 2014, Genome Biology.

[15]  H. Senn,et al.  Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. , 2006, Analytical chemistry.

[16]  Jennifer A Kirwan,et al.  Direct infusion mass spectrometry metabolomics dataset: a benchmark for data processing and quality control , 2014, Scientific Data.

[17]  Hui Jiang,et al.  Non-targeted metabolomics and lipidomics LC–MS data from maternal plasma of 180 healthy pregnant women , 2015, GigaScience.

[18]  Ping Liu,et al.  Pregnancy-induced metabolic phenotype variations in maternal plasma. , 2014, Journal of proteome research.

[19]  Mark R. Viant,et al.  Missing values in mass spectrometry based metabolomics: an undervalued step in the data processing pipeline , 2011, Metabolomics.

[20]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[21]  Laxman Yetukuri,et al.  Algorithms and tools for the preprocessing of LC–MS metabolomics data , 2011 .

[22]  A. Whitehead Comparative genomics in ecological physiology: toward a more nuanced understanding of acclimation and adaptation , 2012, Journal of Experimental Biology.

[23]  Mark R. Viant,et al.  Environmental metabolomics: a critical review and future perspectives , 2009, Metabolomics.

[24]  Joshua D. Knowles,et al.  Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry , 2011, Nature Protocols.

[25]  T. Ebbels,et al.  Metabolic profiling, metabolomic and metabonomic procedures for NMR spectroscopy of urine, plasma, serum and tissue extracts , 2007, Nature Protocols.

[26]  Mark R. Viant,et al.  MI-Pack: Increased confidence of metabolite identification in mass spectra by integrating accurate masses and metabolic pathways , 2010 .

[27]  John M. Brooke,et al.  Workflows for Heliophysics , 2013, Journal of Grid Computing.

[28]  J. Castle,et al.  An integrative genomics approach to infer causal associations between gene expression and disease , 2005, Nature Genetics.