Deciphering complex metabolite mixtures by unsupervised and supervised substructure discovery and semi-automated annotation from MS/MS spectra.

Complex metabolite mixtures are challenging to unravel. Mass spectrometry (MS) is a widely used and sensitive technique for obtaining structural information of complex mixtures. However, just knowing the molecular masses of the mixture's constituents is almost always insufficient for confident assignment of the associated chemical structures. Structural information can be augmented through MS fragmentation experiments whereby detected metabolites are fragmented, giving rise to MS/MS spectra. However, how can we maximize the structural information we gain from fragmentation spectra? We recently proposed a substructure-based strategy to enhance metabolite annotation for complex mixtures by considering metabolites as the sum of (bio)chemically relevant moieties that we can detect through mass spectrometry fragmentation approaches. Our MS2LDA tool allows us to discover - unsupervised - groups of mass fragments and/or neutral losses, termed Mass2Motifs, that often correspond to substructures. After manual annotation, these Mass2Motifs can be used in subsequent MS2LDA analyses of new datasets, thereby providing structural annotations for many molecules that are not present in spectral databases. Here, we describe how additional strategies, taking advantage of (i) combinatorial in silico matching of experimental mass features to substructures of candidate molecules, and (ii) automated machine learning classification of molecules, can facilitate semi-automated annotation of substructures. We show how our approach accelerates the Mass2Motif annotation process and therefore broadens the chemical space spanned by characterized motifs. Our machine learning model used to classify fragmentation spectra learns the relationships between fragment spectra and chemical features. Classification prediction on these features can be aggregated for all molecules that contribute to a particular Mass2Motif and guide Mass2Motif annotations. To make annotated Mass2Motifs available to the community, we also present MotifDB: an open database of Mass2Motifs that can be browsed and accessed programmatically through an Application Programming Interface (API). MotifDB is integrated within ms2lda.org, allowing users to efficiently search for characterized motifs in their own experiments. We expect that with an increasing number of Mass2Motif annotations available through a growing database, we can more quickly gain insight into the constituents of complex mixtures. This will allow prioritization towards novel or unexpected chemistries and faster recognition of known biochemical building blocks.

[1]  László Drahos,et al.  Leucine enkephalin--a mass spectrometry standard. , 2011, Mass spectrometry reviews.

[2]  Nuno Bandeira,et al.  Mass spectral molecular networking of living microbial colonies , 2012, Proceedings of the National Academy of Sciences.

[3]  Lars Ridder,et al.  Substructure-based annotation of high-resolution multistage MS(n) spectral trees. , 2012, Rapid communications in mass spectrometry : RCM.

[4]  Lars Ridder,et al.  Structural elucidation of low abundant metabolites in complex sample matrices , 2013, Metabolomics.

[5]  Lars Ridder,et al.  Automatic chemical structure annotation of an LC-MS(n) based metabolic profile from green tea. , 2013, Analytical chemistry.

[6]  Roger G. Linington,et al.  Molecular networking as a dereplication strategy. , 2013, Journal of natural products.

[7]  R. Bino,et al.  In silico prediction and automatic LC-MS(n) annotation of green tea metabolites in urine. , 2014, Analytical chemistry.

[8]  B. Bowen,et al.  MIDAS: a database-searching algorithm for metabolite identification in metabolomics. , 2014, Analytical chemistry.

[9]  Sebastian Böcker,et al.  Computational mass spectrometry for small-molecule fragmentation , 2014 .

[10]  S. Croubels,et al.  Metabolic fingerprinting reveals a novel candidate biomarker for prednisolone treatment in cattle , 2015, Metabolomics.

[11]  S. Böcker,et al.  Searching molecular structure databases with tandem mass spectra using CSI:FingerID , 2015, Proceedings of the National Academy of Sciences.

[12]  Pieter C Dorrestein,et al.  Illuminating the dark matter in metabolomics , 2015, Proceedings of the National Academy of Sciences.

[13]  Johan Trygg,et al.  Tissue sample stability: thawing effect on multi-organ samples , 2015, Metabolomics.

[14]  Melanie C. Burger,et al.  ChemDoodle Web Components: HTML5 toolkit for chemical graphics, interfaces, and informatics , 2015, Journal of Cheminformatics.

[15]  Ralf Tautenhahn,et al.  Autonomous Metabolomics for Rapid Metabolite Identification in Global Profiling , 2014, Analytical chemistry.

[16]  B. Misra,et al.  Updates in metabolomics tools and resources: 2014–2015 , 2016, Electrophoresis.

[17]  Jonathan Bisson,et al.  Integration of Molecular Networking and In-Silico MS/MS Fragmentation for Natural Products Dereplication. , 2016, Analytical chemistry.

[18]  Evan Bolton,et al.  ClassyFire: automated chemical classification with a comprehensive, computable taxonomy , 2016, Journal of Cheminformatics.

[19]  Joe Wandy,et al.  Topic modeling for untargeted substructure exploration in metabolomics , 2016, Proceedings of the National Academy of Sciences.

[20]  Kristian Fog Nielsen,et al.  Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking , 2016, Nature Biotechnology.

[21]  Emma L. Schymanski,et al.  Mass spectral databases for LC/MS- and GC/MS-based metabolomics: state of the field and future prospects , 2016 .

[22]  Juho Rousu,et al.  Fast metabolite identification with Input Output Kernel Regression , 2016, Bioinform..

[23]  Tobias Depke,et al.  Clustering of MS2 spectra using unsupervised methods to aid the identification of secondary metabolites from Pseudomonas aeruginosa. , 2017, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[24]  C. Pannecouque,et al.  Bioactive Natural Products Prioritization Using Massive Multi-informational Molecular Networks. , 2017, ACS chemical biology.

[25]  Thomas Naake,et al.  MetCirc: navigating mass spectral similarity in high‐resolution MS/MS metabolomics data , 2017, Bioinform..

[26]  Joe Wandy,et al.  Unsupervised Discovery and Comparison of Structural Families Across Multiple Samples in Untargeted Metabolomics , 2017, Analytical chemistry.

[27]  Joe Wandy,et al.  Ms2lda.org: web-based topic modelling for substructure discovery in mass spectrometry , 2017, Bioinform..

[28]  Herbert Oberacher,et al.  Annotating Nontargeted LC-HRMS/MS Data with Two Complementary Tandem Mass Spectral Libraries , 2018, Metabolites.

[29]  Jian Ji,et al.  Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics , 2018, Metabolites.

[30]  Emma L. Schymanski,et al.  Dark matter in host-microbiome metabolomics: Tackling the unknowns-A review. , 2017, Analytica chimica acta.

[31]  Oliver A.H. Jones,et al.  Illuminating the dark metabolome to advance the molecular characterisation of biological systems , 2018, Metabolomics.

[32]  Jean-Marc Nuzillard,et al.  Accelerating Metabolite Identification in Natural Product Research: Toward an Ideal Combination of Liquid Chromatography-High-Resolution Tandem Mass Spectrometry and NMR Profiling, in Silico Databases, and Chemometrics. , 2018, Analytical chemistry.

[33]  Isabel Meister,et al.  Challenges, progress and promises of metabolite annotation for LC-MS-based metabolomics. , 2019, Current opinion in biotechnology.