compMS2Miner: An Automatable Metabolite Identification, Visualization, and Data-Sharing R Package for High-Resolution LC-MS Data Sets.

A long-standing challenge of untargeted metabolomic profiling by ultrahigh-performance liquid chromatography-high-resolution mass spectrometry (UHPLC-HRMS) is efficient transition from unknown mass spectral features to confident metabolite annotations. The compMS2Miner (Comprehensive MS2 Miner) package was developed in the R language to facilitate rapid, comprehensive feature annotation using a peak-picker-output and MS2 data files as inputs. The number of MS2 spectra that can be collected during a metabolomic profiling experiment far outweigh the amount of time required for pain-staking manual interpretation; therefore, a degree of software workflow autonomy is required for broad-scale metabolite annotation. CompMS2Miner integrates many useful tools in a single workflow for metabolite annotation and also provides a means to overview the MS2 data with a Web application GUI compMS2Explorer (Comprehensive MS2 Explorer) that also facilitates data-sharing and transparency. The automatable compMS2Miner workflow consists of the following steps: (i) matching unknown MS1 features to precursor MS2 scans, (ii) filtration of spectral noise (dynamic noise filter), (iii) generation of composite mass spectra by multiple similar spectrum signal summation and redundant/contaminant spectra removal, (iv) interpretation of possible fragment ion substructure using an internal database, (v) annotation of unknowns with chemical and spectral databases with prediction of mammalian biotransformation metabolites, wrapper functions for in silico fragmentation software, nearest neighbor chemical similarity scoring, random forest based retention time prediction, text-mining based false positive removal/true positive ranking, chemical taxonomic prediction and differential evolution based global annotation score optimization, and (vi) network graph visualizations, data curation, and sharing are made possible via the compMS2Explorer application. Metabolite identities and comments can also be recorded using an interactive table within compMS2Explorer. The utility of the package is illustrated with a data set of blood serum samples from 7 diet induced obese (DIO) and 7 nonobese (NO) C57BL/6J mice, which were also treated with an antibiotic (streptomycin) to knockdown the gut microbiota. The results of fully autonomous and objective usage of compMS2Miner are presented here. All automatically annotated spectra output by the workflow are provided in the Supporting Information and can alternatively be explored as publically available compMS2Explorer applications for both positive and negative modes ( https://wmbedmands.shinyapps.io/compMS2_mouseSera_POS and https://wmbedmands.shinyapps.io/compMS2_mouseSera_NEG ). The workflow provided rapid annotation of a diversity of endogenous and gut microbially derived metabolites affected by both diet and antibiotic treatment, which conformed to previously published reports. Composite spectra (n = 173) were autonomously matched to entries of the Massbank of North America (MoNA) spectral repository. These experimental and virtual (lipidBlast) spectra corresponded to 29 common endogenous compound classes (e.g., 51 lysophosphatidylcholines spectra) and were then used to calculate the ranking capability of 7 individual scoring metrics. It was found that an average of the 7 individual scoring metrics provided the most effective weighted average ranking ability of 3 for the MoNA matched spectra in spite of potential risk of false positive annotations emerging from automation. Minor structural differences such as relative carbon-carbon double bond positions were found in several cases to affect the correct rank of the MoNA annotated metabolite. The latest release and an example workflow is available in the package vignette ( https://github.com/WMBEdmands/compMS2Miner ) and a version of the published application is available on the shinyapps.io site ( https://wmbedmands.shinyapps.io/compMS2Example ).

[1]  Jiachao Zhang,et al.  Intestinal Microbiota Distinguish Gout Patients from Healthy Humans , 2016, Scientific Reports.

[2]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[3]  Dinesh K. Barupal,et al.  MetMSLine: an automated and fully integrated pipeline for rapid processing of high-resolution LC–MS metabolomic datasets , 2014, Bioinform..

[4]  Hua Xu,et al.  A Dynamic Noise Level Algorithm for Spectral Screening of Peptide MS/MS Spectra , 2010, BMC Bioinformatics.

[5]  C. Müller,et al.  Adenosine activates brown adipose tissue and recruits beige adipocytes via A2A receptors , 2014, Nature.

[6]  Matthias Müller-Hannemann,et al.  In silico fragmentation for computer assisted identification of metabolite mass spectra , 2010, BMC Bioinformatics.

[7]  Nigel W. Hardy,et al.  The Metabolomics Standards Initiative , 2007, Nature Biotechnology.

[8]  Chris F. Taylor,et al.  A common open representation of mass spectrometry data and its application to proteomics research , 2004, Nature Biotechnology.

[9]  Alexander Goesmann,et al.  MeltDB 2.0–advances of the metabolomics software system , 2013, Bioinform..

[10]  David Ardia,et al.  Differential Evolution (DEoptim) for Non-Convex Portfolio Optimization , 2010 .

[11]  S. Neumann,et al.  CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. , 2012, Analytical chemistry.

[12]  Emma L. Schymanski,et al.  MetFrag relaunched: incorporating strategies beyond in silico fragmentation , 2016, Journal of Cheminformatics.

[13]  W. Golde,et al.  A rapid, simple, and humane method for submandibular bleeding of mice using a lancet , 2005, Lab Animal.

[14]  Yunpeng Qi,et al.  Bile acid signaling in lipid metabolism: metabolomic and lipidomic analysis of lipid and bile acid markers linked to anti-obesity and anti-diabetes in mice. , 2015, Biochimica et biophysica acta.

[15]  B. Mohajer,et al.  Eicosanoids and the small intestine. , 2000, Prostaglandins & other lipid mediators.

[16]  Steven Lai,et al.  MolFind: a software package enabling HPLC/MS-based identification of unknown chemical structures. , 2012, Analytical chemistry.

[17]  Tobias Schulze,et al.  SPLASH, a hashed identifier for mass spectra , 2016, Nature Biotechnology.

[18]  J. García,et al.  Anaerobic Catabolism of Aromatic Compounds: a Genetic and Genomic View , 2009, Microbiology and Molecular Biology Reviews.

[19]  R. Aebersold,et al.  A uniform proteomics MS/MS analysis platform utilizing open XML file formats , 2005, Molecular systems biology.

[20]  Rajarshi Guha,et al.  Chemical Informatics Functionality in R , 2007 .

[21]  B. Finlay,et al.  Altering Host Resistance to Infections through Microbial Transplantation , 2011, PloS one.

[22]  W. R. Wikoff,et al.  Metabolomics analysis reveals large effects of gut microflora on mammalian blood metabolites , 2009, Proceedings of the National Academy of Sciences.

[23]  Ying Zhang,et al.  HMDB: the Human Metabolome Database , 2007, Nucleic Acids Res..

[24]  M. Hirai,et al.  MassBank: a public repository for sharing mass spectral data for life sciences. , 2010, Journal of mass spectrometry : JMS.

[25]  Steffen Neumann,et al.  Highly sensitive feature detection for high resolution LC/MS , 2008, BMC Bioinformatics.

[26]  Joe Wandy,et al.  Topic modeling for untargeted substructure exploration in metabolomics , 2016, Proceedings of the National Academy of Sciences.

[27]  H. Daniel,et al.  Dietary fat and gut microbiota interactions determine diet-induced obesity in mice , 2016, Molecular metabolism.

[28]  Natalie I. Tasman,et al.  A Cross-platform Toolkit for Mass Spectrometry and Proteomics , 2012, Nature Biotechnology.

[29]  Karl Fraser,et al.  Predicting retention time in hydrophilic interaction liquid chromatography mass spectrometry and its use for peak annotation in metabolomics , 2014, Metabolomics.

[30]  Masanori Arita,et al.  MRMPROBS: a data assessment and metabolite identification tool for large-scale multiple reaction monitoring based widely targeted metabolomics. , 2013, Analytical chemistry.

[31]  Masanori Arita,et al.  MS-DIAL: Data Independent MS/MS Deconvolution for Comprehensive Metabolome Analysis , 2015, Nature Methods.

[32]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[33]  S. Neumann,et al.  Metabolite profiling and beyond: approaches for the rapid processing and annotation of human blood serum mass spectrometry data , 2013, Analytical and Bioanalytical Chemistry.

[34]  Ronghong Li,et al.  MyCompoundID MS/MS Search: Metabolite Identification Using a Library of Predicted Fragment-Ion-Spectra of 383,830 Possible Human Metabolites. , 2015, Analytical chemistry.

[35]  L. Iyer,et al.  Diet- and Genetically-Induced Obesity Differentially Affect the Fecal Microbiome and Metabolome in Apc1638N Mice , 2015, PloS one.

[36]  M. V. Van Verk,et al.  Prospecting for Genes involved in transcriptional regulation of plant defenses, a bioinformatics approach , 2011, BMC Plant Biology.

[37]  B. Bonaz,et al.  Urinary leukotriene E4 excretion: A biomarker of inflammatory bowel disease activity , 2008, Inflammatory bowel diseases.

[38]  S. D’Orazio,et al.  Fecal transplantation does not transfer either susceptibility or resistance to food borne listeriosis in C57BL/6 and BALB/c/By mice , 2013, F1000Research.

[39]  G. Schmitz,et al.  Lipidomic Analysis of Serum from High Fat Diet Induced Obese Mice , 2014, International journal of molecular sciences.

[40]  Michael J MacCoss,et al.  Multiplexed peptide analysis using data-independent acquisition and Skyline , 2015, Nature Protocols.

[41]  Oliver Fiehn,et al.  LipidBlast - in-silico tandem mass spectrometry database for lipid identification , 2013, Nature Methods.

[42]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[43]  David Ardia,et al.  DEoptim: An R Package for Global Optimization by Differential Evolution , 2009 .

[44]  Tao Jiang,et al.  ChemmineR: a compound mining framework for R , 2008, Bioinform..

[45]  Tao Huan,et al.  MyCompoundID: using an evidence-based metabolome library for metabolite identification. , 2013, Analytical chemistry.

[46]  Ari Rantanen,et al.  FiD: a software for ab initio structural identification of product ions from tandem mass spectrometric data. , 2008, Rapid communications in mass spectrometry : RCM.

[47]  P. Hylemon,et al.  Bile acids and the gut microbiome , 2014, Current opinion in gastroenterology.

[48]  Russ Greiner,et al.  Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification , 2013, Metabolomics.

[49]  David W. Russell,et al.  LMSD: LIPID MAPS structure database , 2006, Nucleic Acids Res..

[50]  Eric W. Deutsch,et al.  File Formats Commonly Used in Mass Spectrometry Proteomics* , 2012, Molecular & Cellular Proteomics.

[51]  C. Szabó,et al.  Inosine Protects Against the Development of Diabetes in Multiple-Low-Dose Streptozotocin and Nonobese Diabetic Mouse Models of Type 1 Diabetes , 2003, Molecular medicine.

[52]  David Ardia,et al.  Jump-Diffusion Calibration Using Differential Evolution , 2010 .

[53]  A. Jačan,et al.  Cognitive impairment by antibiotic-induced gut dysbiosis: Analysis of gut microbiota-brain communication , 2016, Brain, Behavior, and Immunity.

[54]  B. Bowen,et al.  MIDAS: a database-searching algorithm for metabolite identification in metabolomics. , 2014, Analytical chemistry.

[55]  Robert R. Lewis,et al.  In silico identification software (ISIS): a machine learning approach to tandem mass spectral identification of lipids , 2012, Bioinform..

[56]  T. Meyer,et al.  Colonic contribution to uremic solutes. , 2011, Journal of the American Society of Nephrology : JASN.

[57]  Robert Burke,et al.  ProteoWizard: open source software for rapid proteomics tools development , 2008, Bioinform..

[58]  Yoshiyuki Ogata,et al.  Approaches for extracting practical information from gene co-expression networks in plant biology. , 2007, Plant & cell physiology.

[59]  C. Junot,et al.  High resolution mass spectrometry for structural identification of metabolites in metabolomics , 2015, Metabolomics.

[60]  Juho Rousu,et al.  Metabolite identification and molecular fingerprint prediction through machine learning , 2012, Bioinform..

[61]  Steffen Neumann,et al.  MetFusion: integration of compound identification strategies. , 2013, Journal of mass spectrometry : JMS.

[62]  B. Meijers,et al.  The gut-kidney axis: indoxyl sulfate, p-cresyl sulfate and CKD progression. , 2011, Nephrology, dialysis, transplantation : official publication of the European Dialysis and Transplant Association - European Renal Association.

[63]  David S. Wishart,et al.  CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra , 2014, Nucleic Acids Res..

[64]  R. J. Roberts PubMed Central: The GenBank of the published literature. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[65]  Timothy M. D. Ebbels,et al.  Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data , 2010, Bioinform..

[66]  Y. Hannun,et al.  Protection from High Fat Diet-induced Increase in Ceramide in Mice Lacking Plasminogen Activator Inhibitor 1* , 2008, Journal of Biological Chemistry.