MESSAR: Automated recommendation of metabolite substructures from tandem mass spectra

Despite the increasing importance of non-targeted metabolomics to answer various life science questions, extracting biochemically relevant information from metabolomics spectral data is still an incompletely solved problem. Most computational tools to identify tandem mass spectra focus on a limited set of molecules of interest. However, such tools are typically constrained by the availability of reference spectra or molecular databases, limiting their applicability of generating structural hypotheses for unknown metabolites. In contrast, recent advances in the field illustrate the possibility to expose the underlying biochemistry without relying on metabolite identification, in particular via substructure prediction. We describe an automated method for substructure recommendation motivated by association rule mining. Our framework captures potential relationships between spectral features and substructures learned from public spectral libraries. These associations are used to recommend substructures for any unknown mass spectrum. Our method does not require any predefined metabolite candidates, and therefore it can be used for the hypothesis generation or partial identification of unknown unknowns. The method is called MESSAR (MEtabolite SubStructure Auto-Recommender) and is implemented in a free online web service available at messar.biodatamining.be.

[1]  Bart Goethals,et al.  A primer to frequent itemset mining for bioinformatics , 2013, Briefings Bioinform..

[2]  Adrià Cereto-Massagué,et al.  Molecular fingerprint similarity search in virtual screening. , 2015, Methods.

[3]  Joe Wandy,et al.  Unsupervised Discovery and Comparison of Structural Families Across Multiple Samples in Untargeted Metabolomics , 2017, Analytical chemistry.

[4]  Nuno Bandeira,et al.  Significance estimation for large scale metabolomics annotations by spectral matching , 2017, Nature Communications.

[5]  G. Siuzdak,et al.  Innovation: Metabolomics: the apogee of the omics trilogy , 2012, Nature Reviews Molecular Cell Biology.

[6]  Kazuki Saito,et al.  Hydrogen Rearrangement Rules: Computational MS/MS Fragmentation and Structure Elucidation Using MS-FINDER Software. , 2016, Analytical chemistry.

[7]  Trung Nghia Vu,et al.  InSourcerer: a high-throughput method to search for unknown metabolite modifications by mass spectrometry. , 2017, Rapid communications in mass spectrometry : RCM.

[8]  Erin E. Carlson,et al.  Sharing and community curation of mass spectrometry data with GNPS , 2016 .

[9]  Roger Guimerà,et al.  iMet: A Network-Based Computational Tool To Assist in the Annotation of Metabolites from Tandem Mass Spectra. , 2016, Analytical chemistry.

[10]  D. Wishart Emerging applications of metabolomics in drug discovery and precision medicine , 2016, Nature Reviews Drug Discovery.

[11]  Kristian Fog Nielsen,et al.  Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking , 2016, Nature Biotechnology.

[12]  Lars Ridder,et al.  Substructure-based annotation of high-resolution multistage MS(n) spectral trees. , 2012, Rapid communications in mass spectrometry : RCM.

[13]  Matthias Rarey,et al.  On the Art of Compiling and Using 'Drug‐Like' Chemical Fragment Spaces , 2008, ChemMedChem.

[14]  Xin Chen,et al.  Performance of Similarity Measures in 2D Fragment-Based Similarity Searching: Comparison of Structural Descriptors and Similarity Coefficients , 2002, J. Chem. Inf. Comput. Sci..

[15]  Gary J Patti,et al.  Defining and Detecting Complex Peak Relationships in Mass Spectral Data: The Mz.unity Algorithm. , 2016, Analytical chemistry.

[16]  S. Böcker,et al.  Searching molecular structure databases with tandem mass spectra using CSI:FingerID , 2015, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Joe Wandy,et al.  Ms2lda.org: web-based topic modelling for substructure discovery in mass spectrometry , 2017, Bioinform..

[18]  Dieter Jahn,et al.  Comprehensive comparison of in silico MS/MS fragmentation tools of the CASMI contest: database boosting is needed to achieve 93% accuracy , 2017, Journal of Cheminformatics.

[19]  Joe Wandy,et al.  Topic modeling for untargeted substructure exploration in metabolomics , 2016, Proceedings of the National Academy of Sciences.

[20]  Daniel P Demarque,et al.  Fragmentation reactions using electrospray ionization mass spectrometry: an important tool for the structural elucidation and characterization of synthetic and natural products. , 2016, Natural product reports.

[21]  C. Barbas,et al.  Metabolomics in cancer biomarker discovery: current trends and future perspectives. , 2014, Journal of pharmaceutical and biomedical analysis.

[22]  Roger G. Linington,et al.  Molecular networking as a dereplication strategy. , 2013, Journal of natural products.

[23]  Russ Greiner,et al.  Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification , 2013, Metabolomics.

[24]  William Stafford Noble,et al.  Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. , 2008, Journal of proteome research.

[25]  Bart Goethals,et al.  Unravelling associations between unassigned mass spectrometry peaks with frequent itemset mining techniques , 2014, Proteome Science.

[26]  Matthias Müller-Hannemann,et al.  In silico fragmentation for computer assisted identification of metabolite mass spectra , 2010, BMC Bioinformatics.

[27]  Hiroshi Mamitsuka,et al.  Recent advances and prospects of computational methods for metabolite identification: a review with emphasis on machine learning approaches , 2018, Briefings Bioinform..

[28]  Bart Goethals,et al.  Efficient reduction of candidate matches in peptide spectrum library searching using the top k most intense peaks. , 2014, Journal of proteome research.

[29]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..