PhenoMeNal: Processing and analysis of Metabolomics data in the Cloud

Background Metabolomics is the comprehensive study of a multitude of small molecules to gain insight into an organism’s metabolism. The research field is dynamic and expanding with applications across biomedical, biotechnological and many other applied biological domains. Its computationally-intensive nature has driven requirements for open data formats, data repositories and data analysis tools. However, the rapid progress has resulted in a mosaic of independent – and sometimes incompatible – analysis methods that are difficult to connect into a useful and complete data analysis solution. Findings The PhenoMeNal (Phenome and Metabolome aNalysis) e-infrastructure provides a complete, workflow-oriented, interoperable metabolomics data analysis solution for a modern infrastructure-as-a-service (IaaS) cloud platform. PhenoMeNal seamlessly integrates a wide array of existing open source tools which are tested and packaged as Docker containers through the project’s continuous integration process and deployed based on a kubernetes orchestration framework. It also provides a number of standardized, automated and published analysis workflows in the user interfaces Galaxy, Jupyter, Luigi and Pachyderm. Conclusions PhenoMeNal constitutes a keystone solution in cloud infrastructures available for metabolomics. It provides scientists with a ready-to-use, workflow-driven, reproducible and shareable data analysis platform harmonizing the software installation and configuration through user-friendly web interfaces. The deployed cloud environments can be dynamically scaled to enable large-scale analyses which are interfaced through standard data formats, versioned, and have been tested for reproducibility and interoperability. The flexible implementation of PhenoMeNal allows easy adaptation of the infrastructure to other application areas and ‘omics research domains.

[1]  Nuno Bandeira,et al.  Significance estimation for large scale metabolomics annotations by spectral matching , 2017, Nature Communications.

[2]  W. Wiechert,et al.  How to measure metabolic fluxes: a taxonomic guide for (13)C fluxomics. , 2015, Current opinion in biotechnology.

[3]  Vitaly A. Selivanov,et al.  Edelfosine-induced metabolic changes in cancer cells that precede the overproduction of reactive oxygen species and apoptosis , 2010, BMC Systems Biology.

[4]  J. Markley,et al.  rNMR: open source software for identifying and quantifying metabolites in NMR spectra , 2009, Magnetic resonance in chemistry : MRC.

[5]  C. Jaroniec,et al.  Nmrglue: an open source Python package for the analysis of multidimensional NMR data , 2013, Journal of biomolecular NMR.

[6]  John Chilton,et al.  The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update , 2016, Nucleic Acids Res..

[7]  Douglas B. Kell,et al.  Proposed minimum reporting standards for data analysis in metabolomics , 2007, Metabolomics.

[8]  Steffen Neumann,et al.  IPO: a tool for automated optimization of XCMS parameters , 2015, BMC Bioinformatics.

[9]  Christoph Steinbeck,et al.  MetaboLights—an open-access general-purpose repository for metabolomics studies and associated meta-data , 2012, Nucleic Acids Res..

[10]  Natalie I. Tasman,et al.  A Cross-platform Toolkit for Mass Spectrometry and Proteomics , 2012, Nature Biotechnology.

[11]  Brian E. Granger,et al.  IPython: A System for Interactive Scientific Computing , 2007, Computing in Science & Engineering.

[12]  R. Cox,et al.  A metabolomic comparison of urinary changes in type 2 diabetes in mouse, rat, and human. , 2007, Physiological genomics.

[13]  Christoph Steinbeck,et al.  MetaboLights: towards a new COSMOS of metabolomics data management , 2012, Metabolomics.

[14]  Sven Bergmann,et al.  Metabomatching: Using genetic association to identify metabolites in proton NMR spectroscopy , 2017, PLoS Comput. Biol..

[15]  Christoph Steinbeck,et al.  Current Challenges in Plant Eco-Metabolomics , 2018, International journal of molecular sciences.

[16]  Matej Oresic,et al.  COordination of Standards in MetabOlomicS (COSMOS): facilitating integrated metabolomics data access , 2015, Metabolomics.

[17]  Dima Kozakov,et al.  The ClusPro web server for protein–protein docking , 2017, Nature Protocols.

[18]  Ludovic Cottret,et al.  MetExplore: collaborative edition and exploration of metabolic networks , 2018, Nucleic Acids Res..

[19]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[20]  C. Deborde,et al.  NMRProcFlow: a graphical and interactive tool dedicated to 1D spectra processing for NMR-based metabolomics , 2016, Metabolomics.

[21]  Mark R. Viant,et al.  Environmental metabolomics: a critical review and future perspectives , 2009, Metabolomics.

[22]  S. Neumann,et al.  CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. , 2012, Analytical chemistry.

[23]  Enis Afgan,et al.  BioBlend: automating pipeline analyses within Galaxy and CloudMan , 2013, Bioinform..

[24]  G. Bruce Berriman,et al.  On the Use of Cloud Computing for Scientific Workflows , 2008, 2008 IEEE Fourth International Conference on eScience.

[25]  Chris J. Myers,et al.  JSBML 1.0: providing a smorgasbord of options to encode systems biology models , 2015, Bioinform..

[26]  Jessica A. Turner,et al.  The Ontology for Biomedical Investigations , 2016, PloS one.

[27]  Marcus D. Hanwell,et al.  Open chemistry: RESTful web APIs, JSON, NWChem and the modern web application , 2017, Journal of Cheminformatics.

[28]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[29]  Andrew Glover,et al.  Continuous Integration: Improving Software Quality and Reducing Risk (The Addison-Wesley Signature Series) , 2007 .

[30]  Jordi Rambla De Argila,et al.  Consent Codes: Upholding Standard Data Use Conditions , 2016, PLoS genetics.

[31]  Matej Oresic,et al.  Data standards can boost metabolomics research, and if there is a will, there is a way , 2015, Metabolomics.

[32]  Daniel Jacob,et al.  Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics , 2014, Bioinform..

[33]  H. P. Benton,et al.  XCMS 2 : Processing Tandem Mass Spectrometry Data for Metabolite Identification and Structural Characterization , 2008 .

[34]  Philippe Rinaudo,et al.  biosigner: A New Method for the Discovery of Significant Molecular Signatures from Omics Data , 2016, Front. Mol. Biosci..

[35]  B. Palsson,et al.  The model organism as a system: integrating 'omics' data sets , 2006, Nature Reviews Molecular Cell Biology.

[36]  Christoph Steinbeck,et al.  Computational tools and workflows in metabolomics: An international survey highlights the opportunity for harmonisation through Galaxy , 2016, Metabolomics.

[37]  Alejandra N. González-Beltrán,et al.  The future of metabolomics in ELIXIR , 2017, F1000Research.

[38]  Jasper Engel,et al.  A complete workflow for high-resolution spectral-stitching nanoelectrospray direct-infusion mass-spectrometry-based metabolomics and lipidomics , 2017, Nature Protocols.

[39]  Sam Newman,et al.  Building microservices - designing fine-grained systems, 1st Edition , 2015 .

[40]  Christoph Steinbeck,et al.  nmrML: A Community Supported Open Data Standard for the Description, Storage, and Exchange of NMR Data. , 2018, Analytical chemistry.

[41]  Christoph Steinbeck,et al.  Global open data management in metabolomics , 2017, Current opinion in chemical biology.

[42]  Chris F. Taylor,et al.  Metabolomics standards initiative: ontology working group work in progress , 2007, Metabolomics.

[43]  Murat Sariyar,et al.  Sharing and Reuse of Sensitive Data and Samples: Supporting Researchers in Identifying Ethical and Legal Requirements , 2015, Biopreservation and biobanking.

[44]  Steffen Neumann,et al.  The Risa R/Bioconductor package: integrative data analysis from experimental metadata and back again , 2014, BMC Bioinformatics.

[45]  G. Siuzdak,et al.  XCMS2: processing tandem mass spectrometry data for metabolite identification and structural characterization. , 2008, Analytical chemistry.

[46]  Arcadi Navarro,et al.  The European Genome-phenome Archive of human data consented for biomedical research , 2015, Nature Genetics.

[47]  Ola Spjuth,et al.  Integration of magnetic resonance imaging and protein and metabolite CSF measurements to enable early diagnosis of secondary progressive multiple sclerosis , 2018, Theranostics.

[48]  John C Lindon,et al.  The emergent role of metabolic phenotyping in dynamic patient stratification , 2014, Expert opinion on drug metabolism & toxicology.

[49]  Steffen Neumann,et al.  Computational workflow to study the seasonal variation of secondary metabolites in nine different bryophytes , 2018, Scientific Data.

[50]  Vitaly A. Selivanov,et al.  MIDcor, an R-program for deciphering mass interferences in mass spectra of metabolites enriched in stable isotopes , 2017, BMC Bioinformatics.

[51]  Ola Spjuth,et al.  KubeNow: an On-Demand Cloud-Agnostic Platform for Microservices-Based Research Environments , 2018, ArXiv.

[52]  Geoffrey C. Fox,et al.  Examining the Challenges of Scientific Workflows , 2007, Computer.

[53]  Christian Ludwig,et al.  MetaboLab - advanced NMR data processing and analysis for metabolomics , 2011, BMC Bioinformatics.

[54]  Hiroshi Mamitsuka,et al.  NMRPro: an integrated web component for interactive processing and visualization of NMR spectra , 2016, Bioinform..

[55]  Claudio Luchinat,et al.  High‐Throughput Metabolomics by 1D NMR , 2018, Angewandte Chemie.

[56]  James Taylor,et al.  Next-generation sequencing data interpretation: enhancing reproducibility and accessibility , 2012, Nature Reviews Genetics.

[57]  Ola Spjuth,et al.  Interoperable and scalable metabolomics data analysis with microservices , 2017, bioRxiv.

[58]  Eoin Fahy,et al.  Metabolomics Workbench: An international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools , 2015, Nucleic Acids Res..

[59]  E. Thévenot,et al.  Analysis of the Human Adult Urinary Metabolome Variations with Age, Body Mass Index, and Gender by Implementing a Comprehensive Workflow for Univariate and OPLS Statistical Analyses. , 2015, Journal of proteome research.

[60]  Oliver Hofmann,et al.  ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level , 2010, Bioinform..

[61]  S. Böcker,et al.  Searching molecular structure databases with tandem mass spectra using CSI:FingerID , 2015, Proceedings of the National Academy of Sciences of the United States of America.

[62]  A Burgun,et al.  An architecture for genomics analysis in a clinical setting using Galaxy and Docker , 2017, GigaScience.

[63]  John Ebert SOA with REST: principles, patterns & constraints for building enterprise solutions with REST by Thomas Erl, Benjamin Carlyle, Cesare Pautasso, Raj Balasubramanian , 2013, SOEN.

[64]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[65]  et al.,et al.  Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.

[66]  Paul A. Harris,et al.  A multi-institution evaluation of clinical profile anonymization , 2016, J. Am. Medical Informatics Assoc..

[67]  Anne E. Trefethen,et al.  Toward interoperable bioscience data , 2012, Nature Genetics.

[68]  P. Mell,et al.  The NIST Definition of Cloud Computing , 2011 .

[69]  Ola Spjuth,et al.  Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles , 2016, Journal of Cheminformatics.

[70]  Gianluigi Zanetti,et al.  wft4galaxy: a workflow testing tool for galaxy , 2017, Bioinform..

[71]  Robert D. Hall,et al.  Metabolomics across the globe , 2012, Metabolomics.

[72]  Joerg M. Buescher,et al.  A roadmap for interpreting (13)C metabolite labeling patterns from cells. , 2015, Current opinion in biotechnology.

[73]  D. Raftery,et al.  Metabolomics-based methods for early disease diagnostics , 2008, Expert review of molecular diagnostics.

[74]  Maria De Iorio,et al.  Bayesian deconvolution and quantification of metabolites in complex 1D NMR spectra using BATMAN , 2014, Nature Protocols.

[75]  Emma L. Schymanski,et al.  MetFrag relaunched: incorporating strategies beyond in silico fragmentation , 2016, Journal of Cheminformatics.

[76]  Knut Reinert,et al.  OpenMS – An open-source software framework for mass spectrometry , 2008, BMC Bioinformatics.

[77]  Ola Spjuth,et al.  Interoperable and scalable data analysis with microservices: applications in metabolomics , 2019, Bioinform..

[78]  Elaine Holmes,et al.  Power Analysis and Sample Size Determination in Metabolic Phenotyping. , 2016, Analytical chemistry.

[79]  Alban Gaignard,et al.  Scientific workflows for computational reproducibility in the life sciences: Status, challenges and opportunities , 2017, Future Gener. Comput. Syst..

[80]  Egon L. Willighagen,et al.  The Chemical Translation Service—a web-based tool to improve standardization of metabolomic reports , 2010, Bioinform..

[81]  Bernhard O. Palsson,et al.  Escher: A Web Application for Building, Sharing, and Embedding Data-Rich Visualizations of Biological Pathways , 2015, PLoS Comput. Biol..

[82]  Ola Spjuth,et al.  Container-based bioinformatics with Pachyderm , 2018, bioRxiv.

[83]  Antonio Rosato,et al.  From correlation to causation: analysis of metabolomics data using systems biology approaches , 2018, Metabolomics.