ProbMetab : an R package for Bayesian probabilistic annotation of LC-MS based metabolomics

Summary: We present ProbMetab, an R package that promotes substantial improvement in automatic probabilistic liquid chromatography–mass spectrometry-based metabolome annotation. The inference engine core is based on a Bayesian model implemented to (i) allow diverse source of experimental data and metadata to be systematically incorporated into the model with alternative ways to calculate the likelihood function and (ii) allow sensitive selection of biologically meaningful biochemical reaction databases as Dirichlet-categorical prior distribution. Additionally, to ensure result interpretation by system biologists, we display the annotation in a network where observed mass peaks are connected if their candidate metabolites are substrate/product of known biochemical reactions. This graph can be overlaid with other graph-based analysis, such as partial correlation networks, in a visualization scheme exported to Cytoscape, with web and stand-alone versions. Availability and implementation: ProbMetab was implemented in a modular manner to fit together with established upstream (xcms, CAMERA, AStream, mzMatch.R, etc) and downstream R package tools (GeneNet, RCytoscape, DiffCorr, etc). ProbMetab, along with extensive documentation and case studies, is freely available under GNU license at: http://labpib.fmrp.usp.br/methods/probmetab/. Contact: rvencio@usp.br Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[2]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[3]  Joachim Selbig,et al.  Identification of metabolic and biomass QTL in Arabidopsis thaliana in a parallel analysis of RIL and IL populations , 2007, The Plant journal : for cell and molecular biology.

[4]  Antony J. Williams,et al.  ChemSpider:: An Online Chemical Information Resource , 2010 .

[5]  Oliver Fiehn,et al.  MetaMapp: mapping and visualizing metabolomic data by integrating information from biochemical pathways and chemical and mass spectral similarity , 2012, BMC Bioinformatics.

[6]  M. Jahufer,et al.  Tradeoff between Biomass and Flavonoid Accumulation in White Clover Reflects Contrasting Plant Strategies , 2011, PloS one.

[7]  A. Fukushima DiffCorr: an R package to analyze and visualize differential correlations in biological networks. , 2013, Gene.

[8]  David J. Galas,et al.  RCytoscape: tools for exploratory network analysis , 2013, BMC Bioinformatics.

[9]  R. Breitling,et al.  Toward global metabolomics analysis with hydrophilic interaction liquid chromatography-mass spectrometry: improved metabolite identification by retention time prediction. , 2011, Analytical chemistry.

[10]  D. Schaid,et al.  Glycine and a Glycine Dehydrogenase (GLDC) SNP as Citalopram/Escitalopram Response Biomarkers in Depression: Pharmacometabolomics‐Informed Pharmacogenomics , 2011, Clinical pharmacology and therapeutics.

[11]  Longjiang Yu,et al.  Influence of Drought on Oxidative Stress and Flavonoid Production in Cell Suspension Culture of Glycyrrhiza inflata Batal , 2007, Zeitschrift fur Naturforschung. C, Journal of biosciences.

[12]  Ralf J. M. Weber,et al.  Characterization of isotopic abundance measurements in high resolution FT-ICR and Orbitrap mass spectra for improved confidence of metabolite identification. , 2011, Analytical chemistry.

[13]  David R. Gilbert,et al.  MetaNetter: inference and visualization of high-resolution metabolomic networks , 2008, Bioinform..

[14]  Richard N. Zare,et al.  13C/12C ratio measurements of aromatic molecules using photoionization with TOF mass spectrometry , 1995 .

[15]  Yanli Wang,et al.  PubChem: a public information system for analyzing bioactivities of small molecules , 2009, Nucleic Acids Res..

[16]  Roman Kaliszan,et al.  Quantitative structure-chromatographic retention relationships , 1987 .

[17]  Fabian J. Theis,et al.  Gaussian graphical modeling reconstructs pathway reactions from high-throughput metabolomics data , 2011, BMC Systems Biology.

[18]  Dirk Eddelbuettel,et al.  Rcpp: Seamless R and C++ Integration , 2011 .

[19]  Ralf Steuer,et al.  Review: On the analysis and interpretation of correlations in metabolomic data , 2006, Briefings Bioinform..

[20]  Rainer Breitling,et al.  Ab initio prediction of metabolic networks using Fourier transform mass spectrometry data , 2006, Metabolomics.

[21]  Sang Yup Lee,et al.  Recent advances in reconstruction and applications of genome-scale metabolic models. , 2012, Current opinion in biotechnology.

[22]  Mark Girolami,et al.  Handbook of Statistical Systems Biology , 2011 .

[23]  R. Breitling,et al.  PeakML/mzMatch: a file format, Java library, R library, and tool-chain for mass spectrometry data analysis. , 2011, Analytical chemistry.

[24]  Paul H. Moore,et al.  Temporal and spatial regulation of sucrose accumulation in the sugarcane stem , 1995 .

[25]  Gary D. Bader,et al.  Cytoscape Web: an interactive web-based network browser , 2010, Bioinform..

[26]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[27]  Alok Ranjan,et al.  CAMTA 1 regulates drought responses in Arabidopsis thaliana , 2013, BMC Genomics.

[28]  Rainer Breitling,et al.  Separating the wheat from the chaff: a prioritisation pipeline for the analysis of metabolomics datasets , 2011, Metabolomics.

[29]  Ralf J. M. Weber,et al.  Mass appeal: metabolite identification in mass spectrometry-focused untargeted metabolomics , 2012, Metabolomics.

[30]  Graham D. Bonnett,et al.  Sucrose accumulation in the sugarcane stem: pathways and control points for transport and compartmentation , 2005 .

[31]  Xavier Correig,et al.  Assessment of compatibility between extraction methods for NMR- and LC/MS-based metabolomics. , 2012, Analytical chemistry.

[32]  Ralf Tautenhahn,et al.  An accelerated workflow for untargeted metabolomics using the METLIN database , 2012, Nature Biotechnology.

[33]  Peter D. Karp,et al.  A systematic comparison of the MetaCyc and KEGG pathway databases , 2013, BMC Bioinformatics.

[34]  Christoph Steinbeck,et al.  MetaboLights: towards a new COSMOS of metabolomics data management , 2012, Metabolomics.

[35]  R. Breitling,et al.  Modeling challenges in the synthetic biology of secondary metabolism. , 2013, ACS synthetic biology.

[36]  J. Keurentjes,et al.  Untargeted large-scale plant metabolomics using liquid chromatography coupled to mass spectrometry , 2007, Nature Protocols.

[37]  Tomáš Pluskal,et al.  Highly accurate chemical formula prediction tool utilizing high-resolution mass spectra, MS/MS fragmentation, heuristic rules, and isotope pattern matching. , 2012, Analytical chemistry.

[38]  F Baganz,et al.  Systematic functional analysis of the yeast genome. , 1998, Trends in biotechnology.

[39]  Rainer Breitling,et al.  Bayesian Approaches for Mass Spectrometry‐Based Metabolomics , 2011 .

[40]  Roman Kaliszan,et al.  QSRR: quantitative structure-(chromatographic) retention relationships. , 2007, Chemical reviews.

[41]  Simon Rogers,et al.  Probabilistic assignment of formulas to mass peaks in metabolomics experiments , 2009, Bioinform..

[42]  Alan J. Miller Sélection of subsets of regression variables , 1984 .

[43]  Zsuzsanna Lipták,et al.  SIRIUS: decomposing isotope patterns for metabolite identification , 2008, Bioinform..

[44]  S. Neumann,et al.  CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. , 2012, Analytical chemistry.

[45]  Michael P. Barrett,et al.  Untargeted Metabolomics Reveals a Lack Of Synergy between Nifurtimox and Eflornithine against Trypanosoma brucei , 2012, PLoS neglected tropical diseases.

[46]  Ludovic Cottret,et al.  Metabolic network visualization eliminating node redundance and preserving metabolic pathways , 2007, BMC Systems Biology.

[47]  Shan He,et al.  CASMI—The Small Molecule Identification Process from a Birmingham Perspective , 2013, Metabolites.

[48]  Hiroaki Kitano,et al.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models , 2003, Bioinform..

[49]  Jordi Duran,et al.  A Guideline to Univariate Statistical Analysis for LC/MS-Based Untargeted Metabolomics-Derived Data , 2012, Metabolites.

[50]  Oliver Fiehn,et al.  Metabolomic database annotations via query of elemental compositions: Mass accuracy is insufficient even at less than 1 ppm , 2006, BMC Bioinformatics.

[51]  Natalie I. Tasman,et al.  A Cross-platform Toolkit for Mass Spectrometry and Proteomics , 2012, Nature Biotechnology.

[52]  Michael P. Barrett,et al.  MetExplore: a web server to link metabolomic experiments and genome-scale metabolic networks , 2010, Nucleic Acids Res..

[53]  Laurent Debrauwer,et al.  Use of reconstituted metabolic networks to assist in metabolomic data visualization and mining , 2010, Metabolomics.

[54]  Ralf Tautenhahn,et al.  Meta-analysis of untargeted metabolomic data from multiple profiling experiments , 2012, Nature Protocols.

[55]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[56]  Marta Díaz,et al.  AStream: an R package for annotating LC/MS metabolomic data , 2011, Bioinform..

[57]  Rainer Breitling,et al.  IDEOM: an Excel interface for analysis of LC-MS-based metabolomics data , 2012, Bioinform..

[58]  Ute Roessner,et al.  Minimum reporting standards for plant biology context information in metabolomic studies , 2007, Metabolomics.

[59]  Chris T. A. Evelo,et al.  WikiPathways: building research communities on biological pathways , 2011, Nucleic Acids Res..