Pathway-Activity Likelihood Analysis and Metabolite Annotation for Untargeted Metabolomics Using Probabilistic Modeling

Motivation: Untargeted metabolomics comprehensively characterizes small molecules and elucidates activities of biochemical pathways within a biological sample. Despite computational advances, interpreting collected measurements and determining their biological role remains a challenge. Results: To interpret measurements, we present an inference-based approach, termed Probabilistic modeling for Untargeted Metabolomics Analysis (PUMA). Our approach captures metabolomics measurements and the biological network for the biological sample under study in a generative model and uses stochastic sampling to compute posterior probability distributions. PUMA predicts the likelihood of pathways being active, and then derives probabilistic annotations, which assign chemical identities to measurements. Unlike prior pathway analysis tools that analyze differentially active pathways, PUMA defines a pathway as active if the likelihood that the path generated the observed measurements is above a particular (user-defined) threshold. Due to the lack of “ground truth” metabolomics datasets, where all measurements are annotated and pathway activities are known, PUMA is validated on synthetic datasets that are designed to mimic cellular processes. PUMA, on average, outperforms pathway enrichment analysis by 8%. PUMA is applied to two case studies. PUMA suggests many biological meaningful pathways as active. Annotation results were in agreement to those obtained using other tools that utilize additional information in the form of spectral signatures. Importantly, PUMA annotates many measurements, suggesting 23 chemical identities for metabolites that were previously only identified as isomers, and a significant number of additional putative annotations over spectral database lookups. For an experimentally validated 50-compound dataset, annotations using PUMA yielded 0.833 precision and 0.676 recall.

[1]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[2]  Pieter C. Dorrestein,et al.  ZODIAC: database-independent molecular formula annotation using Gibbs sampling reveals unknown small molecules , 2019, bioRxiv.

[3]  Marta-Marina Pérez-Alonso,et al.  When Transcriptomics and Metabolomics Work Hand in Hand: A Case Study Characterizing Plant CDF Transcription Factors , 2018, High-throughput.

[4]  Nichole L. King,et al.  Development and validation of a spectral library searching method for peptide identification from MS/MS , 2007, Proteomics.

[5]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[6]  Alex Sánchez-Pla,et al.  Evaluation and comparison of bioinformatic tools for the enrichment analysis of metabolomics data , 2018, BMC Bioinformatics.

[7]  David S. Wishart,et al.  MSEA: a web-based tool to identify biologically meaningful patterns in quantitative metabolomic data , 2010, Nucleic Acids Res..

[8]  Soha Hassoun,et al.  Towards creating an extended metabolic model (EMM) for E. coli using enzyme promiscuity prediction and metabolomics data , 2019, bioRxiv.

[9]  Matthias Müller-Hannemann,et al.  In silico fragmentation for computer assisted identification of metabolite mass spectra , 2010, BMC Bioinformatics.

[10]  G. Siuzdak,et al.  METLIN: A Technology Platform for Identifying Knowns and Unknowns. , 2018, Analytical chemistry.

[11]  Ari Rantanen,et al.  FiD: a software for ab initio structural identification of product ions from tandem mass spectrometric data. , 2008, Rapid communications in mass spectrometry : RCM.

[12]  Thomas V. Wiecki,et al.  Probabilistic Programming in Python using PyMC , 2015, 1507.08050.

[13]  Simon Rogers,et al.  Probabilistic assignment of formulas to mass peaks in metabolomics experiments , 2009, Bioinform..

[14]  Soha Hassoun,et al.  Towards creating an extended metabolic model (EMM) for E. coli using enzyme promiscuity prediction and metabolomics data , 2019 .

[15]  S. Neumann,et al.  CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. , 2012, Analytical chemistry.

[16]  Dean P. Jones,et al.  Endoplasmic reticulum stress in nonalcoholic fatty liver disease. , 2012, Annual review of nutrition.

[17]  Kyongbum Lee,et al.  Biologically Consistent Annotation of Metabolomics Data. , 2017, Analytical chemistry.

[18]  Marta Sales-Pardo,et al.  iMet: A computational tool for structural annotation of unknown metabolites from tandem mass spectra , 2016, 1607.04122.

[19]  L. Williams,et al.  Contents , 2020, Ophthalmology (Rochester, Minn.).

[20]  Christophe Junot,et al.  Annotation of the human adult urinary metabolome and metabolite identification using ultra high performance liquid chromatography coupled to a linear quadrupole ion trap-Orbitrap mass spectrometer. , 2012, Analytical chemistry.

[21]  Shuzhao Li,et al.  Predicting Network Activity from High Throughput Metabolomics , 2013, PLoS Comput. Biol..

[22]  Rainer Breitling,et al.  Integrated Probabilistic Annotation (IPA): A Bayesian-based annotation method for metabolomic profiles integrating biochemical connections, isotope patterns and adduct relationships. , 2019, Analytical chemistry.

[23]  D. Wishart,et al.  Translational biomarker discovery in clinical metabolomics: an introductory tutorial , 2012, Metabolomics.

[24]  Roger Guimerà,et al.  iMet: A Network-Based Computational Tool To Assist in the Annotation of Metabolites from Tandem Mass Spectra. , 2016, Analytical chemistry.

[25]  Matej Oresic,et al.  MPEA - metabolite pathway enrichment analysis , 2011, Bioinform..

[26]  Emilien L. Jamin,et al.  ProbMetab : an R package for Bayesian probabilistic annotation of LC-MS based metabolomics , 2013 .

[27]  Tao Huan,et al.  Data processing, multi-omic pathway mapping, and metabolite activity analysis using XCMS Online , 2018, Nature Protocols.

[28]  Timothy M. D. Ebbels,et al.  Integrated pathway-level analysis of transcriptomics and metabolomics data with IMPaLA , 2011 .

[29]  David S. Wishart,et al.  CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra , 2014, Nucleic Acids Res..

[30]  R. Abagyan,et al.  METLIN: A Metabolite Mass Spectral Database , 2005, Therapeutic drug monitoring.

[31]  John Parkinson,et al.  The conservation and evolutionary modularity of metabolism , 2009, Genome Biology.

[32]  Matej Oresic,et al.  MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data , 2010, BMC Bioinformatics.

[33]  Ying Zhang,et al.  HMDB: the Human Metabolome Database , 2007, Nucleic Acids Res..

[34]  Christoph Steinbeck,et al.  MetaboLights—an open-access general-purpose repository for metabolomics studies and associated meta-data , 2012, Nucleic Acids Res..

[35]  Division on Earth Use of Metabolomics to Advance Research on Environmental Exposures and the Human Exposome , 2016 .

[36]  David S. Wishart,et al.  MetaboAnalyst 3.0—making metabolomics more meaningful , 2015, Nucleic Acids Res..

[37]  Eoin Fahy,et al.  Metabolomics Workbench: An international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools , 2015, Nucleic Acids Res..

[38]  Soha Hassoun,et al.  Biological Filtering and Substrate Promiscuity Prediction for Annotating Untargeted Metabolomics , 2019, bioRxiv.

[39]  Chris Sander,et al.  Pathway information for systems biology , 2005, FEBS letters.

[40]  M. Hirai,et al.  MassBank: a public repository for sharing mass spectral data for life sciences. , 2010, Journal of mass spectrometry : JMS.

[41]  S. Böcker,et al.  Searching molecular structure databases with tandem mass spectra using CSI:FingerID , 2015, Proceedings of the National Academy of Sciences of the United States of America.

[42]  John Salvatier,et al.  Probabilistic programming in Python using PyMC3 , 2016, PeerJ Comput. Sci..

[43]  Gary W Caldwell,et al.  Can Untargeted Metabolomics Be Utilized in Drug Discovery/Development? , 2017, Current topics in medicinal chemistry.

[44]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[45]  David S. Wishart,et al.  HMDB 3.0—The Human Metabolome Database in 2013 , 2012, Nucleic Acids Res..

[46]  Juho Rousu,et al.  SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information , 2019, Nature Methods.

[47]  Changyu Shen,et al.  An empirical Bayes model using a competition score for metabolite identification in gas chromatography mass spectrometry , 2011, BMC Bioinformatics.