MESSAR: Automated recommendation of metabolite substructures from tandem mass spectra

Despite the increasing importance of non-targeted metabolomics to answer various life science questions, extracting biochemically relevant information from metabolomics spectral data is still an incompletely solved problem. Most computational tools to identify tandem mass spectra focus on a limited set of molecules of interest. However, such tools are typically constrained by the availability of reference spectra or molecular databases, limiting their applicability to identify unknown metabolites. In contrast, recent advances in the field illustrate the possibility to expose the underlying biochemistry without relying on metabolite identification, in particular via substructure prediction. We describe an automated method for substructure recommendation motivated by association rule mining. Our framework captures potential relationships between spectral features and substructures learned from public spectral libraries. These associations are used to recommend substructures for any unknown mass spectrum. Our method does not require any predefined metabolite candidates, and therefore it can be used for the partial identification of unknown unknowns. The method is called MESSAR (MEtabolite SubStructure Auto-Recommender) and is implemented in a free online web service available at messar.biodatamining.be.

[1]  Dieter Jahn,et al.  Comprehensive comparison of in silico MS/MS fragmentation tools of the CASMI contest: database boosting is needed to achieve 93% accuracy , 2017, Journal of Cheminformatics.

[2]  Gary J Patti,et al.  Defining and Detecting Complex Peak Relationships in Mass Spectral Data: The Mz.unity Algorithm. , 2016, Analytical chemistry.

[3]  Roger Guimerà,et al.  iMet: A Network-Based Computational Tool To Assist in the Annotation of Metabolites from Tandem Mass Spectra. , 2016, Analytical chemistry.

[4]  Hiroshi Mamitsuka,et al.  Recent advances and prospects of computational methods for metabolite identification: a review with emphasis on machine learning approaches , 2018, Briefings Bioinform..

[5]  Mohamed Bekkar,et al.  Evaluation Measures for Models Assessment over Imbalanced Data Sets , 2013 .

[6]  Oliver Fiehn,et al.  MS2Analyzer: A Software for Small Molecule Substructure Annotations from Accurate Tandem Mass Spectra , 2014, Analytical chemistry.

[7]  Matthias Rarey,et al.  On the Art of Compiling and Using 'Drug‐Like' Chemical Fragment Spaces , 2008, ChemMedChem.

[8]  Nuno Bandeira,et al.  Significance estimation for large scale metabolomics annotations by spectral matching , 2017, Nature Communications.

[9]  Adrià Cereto-Massagué,et al.  Molecular fingerprint similarity search in virtual screening. , 2015, Methods.

[10]  Bart Goethals,et al.  Unravelling associations between unassigned mass spectrometry peaks with frequent itemset mining techniques , 2014, Proteome Science.

[11]  Daniel P Demarque,et al.  Fragmentation reactions using electrospray ionization mass spectrometry: an important tool for the structural elucidation and characterization of synthetic and natural products. , 2016, Natural product reports.

[12]  Xin Chen,et al.  Performance of Similarity Measures in 2D Fragment-Based Similarity Searching: Comparison of Structural Descriptors and Similarity Coefficients , 2002, J. Chem. Inf. Comput. Sci..

[13]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[14]  Bart Goethals,et al.  Efficient reduction of candidate matches in peptide spectrum library searching using the top k most intense peaks. , 2014, Journal of proteome research.

[15]  Trung Nghia Vu,et al.  InSourcerer: a high-throughput method to search for unknown metabolite modifications by mass spectrometry. , 2017, Rapid communications in mass spectrometry : RCM.

[16]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[17]  C. Barbas,et al.  Metabolomics in cancer biomarker discovery: current trends and future perspectives. , 2014, Journal of pharmaceutical and biomedical analysis.

[18]  S. Böcker,et al.  Searching molecular structure databases with tandem mass spectra using CSI:FingerID , 2015, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Russ Greiner,et al.  Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification , 2013, Metabolomics.

[20]  Bart Goethals,et al.  A primer to frequent itemset mining for bioinformatics , 2013, Briefings Bioinform..

[21]  Joe Wandy,et al.  Topic modeling for untargeted substructure exploration in metabolomics , 2016, Proceedings of the National Academy of Sciences.

[22]  Roger G. Linington,et al.  Molecular networking as a dereplication strategy. , 2013, Journal of natural products.

[23]  Matthias Müller-Hannemann,et al.  In silico fragmentation for computer assisted identification of metabolite mass spectra , 2010, BMC Bioinformatics.

[24]  William Stafford Noble,et al.  Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. , 2008, Journal of proteome research.

[25]  Kazuki Saito,et al.  Hydrogen Rearrangement Rules: Computational MS/MS Fragmentation and Structure Elucidation Using MS-FINDER Software. , 2016, Analytical chemistry.

[26]  D. Wishart Emerging applications of metabolomics in drug discovery and precision medicine , 2016, Nature Reviews Drug Discovery.