Taxonomically Informed Scoring Enhances Confidence in Natural Products Annotation

Mass spectrometry (MS) hyphenated to liquid chromatography (LC)-MS offers unrivalled sensitivity for metabolite profiling of complex biological matrices encountered in natural products (NP) research. With advanced platforms LC, MS/MS spectra are acquired in an untargeted manner on most detected features. This generates massive and complex sets of spectral data that provide valuable structural information on most analytes. To interpret such datasets, computational methods are mandatory. To this extent, computerized annotation of metabolites links spectral data to candidate structures. When profiling complex extracts spectra are often organized in clusters by similarity via Molecular Networking (MN). A spectral matching score is usually established between the acquired data and experimental or theoretical spectral databases (DB). The process leads to various candidate structures for each MS features. At this stage, obtaining high annotation confidence level remains a challenge notably due to the high chemodiversity of specialized metabolomes. The integration of additional information in a meta-score is a way to capture complementary experimental attributes and improve the annotation process. Here we show that integrating unambiguous taxonomic position of analyzed samples and candidate structures enhances confidence in metabolite annotation. A script is proposed to automatically input such information at various granularity levels (species, genus, and family) and weight the score obtained between experimental spectral data and output of available computational metabolite annotation tools (ISDB-DNP, MS-Finder, Sirius). In all cases, the consideration of the taxonomic distance allowed an efficient re-ranking of the candidate structures leading to a systematic enhancement of the recall and precision rates of the tools (1.5 to 7-fold increase in the F1 score). Our results clearly demonstrate the importance of considering taxonomic information in the process of specialized metabolites’ annotation. This requires to access structural data systematically documented with biological origin, both for new and previously reported NPs. In this respect, the establishment of an open structural DB of specialized metabolites and their associated metadata (particularly biological sources) is timely and critical for the NP research community.

[1]  Jean-Marc Nuzillard,et al.  Accelerating Metabolite Identification in Natural Product Research: Toward an Ideal Combination of Liquid Chromatography-High-Resolution Tandem Mass Spectrometry and NMR Profiling, in Silico Databases, and Chemometrics. , 2018, Analytical chemistry.

[2]  Pieter C. Dorrestein,et al.  Implementations of the chemical structural and compositional similarity metric in R and Python , 2019, bioRxiv.

[3]  S. Degroeve,et al.  Comprehensive and Empirical Evaluation of Machine Learning Algorithms for Small Molecule LC Retention Time Prediction. , 2019, Analytical chemistry.

[4]  K. R. Clarke,et al.  A taxonomic distinctness index and its statistical properties , 1998 .

[5]  Robert R Junker,et al.  A biosynthetically informed distance measure to compare secondary metabolite profiles , 2017, Chemoecology.

[6]  Arjen Lommen,et al.  Ultra-fast searching assists in evaluating sub-ppm mass accuracy enhancement in U-HPLC/Orbitrap MS data , 2010, Metabolomics.

[7]  Tobias Depke,et al.  Clustering of MS2 spectra using unsupervised methods to aid the identification of secondary metabolites from Pseudomonas aeruginosa. , 2017, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[8]  Kai Blin,et al.  antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters , 2015, Nucleic Acids Res..

[9]  Kazuki Saito,et al.  Hydrogen Rearrangement Rules: Computational MS/MS Fragmentation and Structure Elucidation Using MS-FINDER Software. , 2016, Analytical chemistry.

[10]  David Newman Faculty Opinions recommendation of Dereplication: racing to speed up the natural products discovery process. , 2017 .

[11]  Paul Beynon-Davies,et al.  Taxonomic Distance - Classification and Navigation , 1995, ICHIM, Multimedia Computing and Museums.

[12]  Oliver Fiehn,et al.  Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry , 2007, BMC Bioinformatics.

[13]  Matej Oresic,et al.  MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data , 2010, BMC Bioinformatics.

[14]  Natalie I. Tasman,et al.  A Cross-platform Toolkit for Mass Spectrometry and Proteomics , 2012, Nature Biotechnology.

[15]  Shu‐Ming Li,et al.  Ergot alkaloids: structure diversity, biosynthetic gene clusters and functional proof of biosynthetic genes. , 2011, Natural product reports.

[16]  G. Challis,et al.  Discovery of microbial natural products by activation of silent biosynthetic gene clusters , 2015, Nature Reviews Microbiology.

[17]  Jody C. May,et al.  Predicting Ion Mobility Collision Cross-Sections Using a Deep Neural Network: DeepCCS. , 2019, Analytical chemistry.

[18]  Justin J. J. van der Hooft,et al.  Assessing Specialized Metabolite Diversity in the Cosmopolitan Plant Genus Euphorbia L. , 2019, Front. Plant Sci..

[19]  Juho Rousu,et al.  Liquid‐chromatography retention order prediction for metabolite identification , 2018, Bioinform..

[20]  Madeleine Ernst,et al.  Comprehensive mass spectrometry-guided phenotyping of plant specialized metabolites reveals metabolic diversity in the cosmopolitan plant family Rhamnaceae. , 2019, The Plant journal : for cell and molecular biology.

[21]  J. Gershenzon,et al.  Chemical convergence between plants and insects: biosynthetic origins and functions of common secondary metabolites. , 2019, The New phytologist.

[22]  O. Fiehn,et al.  Strategies for dereplication of natural compounds using high-resolution tandem mass spectrometry. , 2017, Phytochemistry letters.

[23]  Liu Cao,et al.  Dereplication of microbial metabolites through database search of mass spectra , 2018, Nature Communications.

[24]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[25]  A. Price,et al.  Measuring b-diversity using a taxonomic similarity index, and its relation to spatial scale , 2001 .

[26]  Roger G. Linington,et al.  Insights into Secondary Metabolism from a Global Analysis of Prokaryotic Biosynthetic Gene Clusters , 2014, Cell.

[27]  Mingxun Wang,et al.  Propagating annotations of molecular networks using in silico fragmentation , 2018, PLoS Comput. Biol..

[28]  Roberto G S Berlinck,et al.  Approaches for the isolation and identification of hydrophilic, light-sensitive, volatile and minor natural products. , 2019, Natural product reports.

[29]  Azian Azamimi Abdullah,et al.  Novel Approach to Classify Plants Based on Metabolite-Content Similarity , 2017, BioMed research international.

[30]  C. Olson,et al.  Peer review of the biomedical literature. , 1990, The American journal of emergency medicine.

[31]  Simon Rogers,et al.  Linking biosynthetic and chemical space to accelerate microbial secondary metabolite discovery , 2019, FEMS microbiology letters.

[32]  Shu-Lin Chang,et al.  Recent advances in awakening silent biosynthetic gene clusters and linking orphan clusters to natural products in microorganisms. , 2011, Current opinion in chemical biology.

[33]  M. Ajmal Ali,et al.  India needs more plant taxonomists , 2011, Nature.

[34]  Emma L. Schymanski,et al.  MetFrag relaunched: incorporating strategies beyond in silico fragmentation , 2016, Journal of Cheminformatics.

[35]  Jonathan Bisson,et al.  Integration of Molecular Networking and In-Silico MS/MS Fragmentation for Natural Products Dereplication. , 2016, Analytical chemistry.

[36]  M. Mann,et al.  Parts per Million Mass Accuracy on an Orbitrap Mass Spectrometer via Lock Mass Injection into a C-trap*S , 2005, Molecular & Cellular Proteomics.

[37]  Russ Greiner,et al.  Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification , 2013, Metabolomics.

[38]  J. G. Burleigh,et al.  Synthesis of phylogeny and taxonomy into a comprehensive tree of life , 2014, Proceedings of the National Academy of Sciences.

[39]  Nuno Bandeira,et al.  Spectral Library Generating Function for Assessing Spectrum-Spectrum Match Significance , 2013, RECOMB.

[40]  Carin Li,et al.  CFM-ID 3.0: Significantly Improved ESI-MS/MS Prediction and Compound Identification , 2019, Metabolites.

[41]  A. Makarov,et al.  Evolution of Orbitrap Mass Spectrometry Instrumentation. , 2015, Annual review of analytical chemistry.

[42]  Kristian Fog Nielsen,et al.  Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking , 2016, Nature Biotechnology.

[43]  Joe Wandy,et al.  MolNetEnhancer: enhanced molecular networks by integrating metabolome mining and annotation tools , 2019 .

[44]  David S. Wishart,et al.  CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra , 2014, Nucleic Acids Res..

[45]  R. Landberg,et al.  Interlaboratory Coverage Test on Plant Food Bioactive Compounds and Their Metabolites by Mass Spectrometry-Based Untargeted Metabolomics , 2018, Metabolites.

[46]  Eran Pichersky,et al.  Convergent evolution in plant specialized metabolism. , 2011, Annual review of plant biology.

[47]  Rolf Müller,et al.  Correlating chemical diversity with taxonomic distance for discovery of natural products in myxobacteria , 2018, Nature Communications.

[48]  A. Böttger,et al.  Plant Secondary Metabolites and Their General Function in Plants , 2018 .

[49]  David G. Corley,et al.  Strategies for Database Dereplication of Natural Products , 1994 .

[50]  Erin E. Carlson,et al.  Collision-Induced Dissociation Mass Spectrometry: A Powerful Tool for Natural Product Structure Elucidation. , 2015, Analytical chemistry.

[51]  Juho Rousu,et al.  SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information , 2019, Nature Methods.

[52]  Evan Bolton,et al.  ClassyFire: automated chemical classification with a comprehensive, computable taxonomy , 2016, Journal of Cheminformatics.

[53]  B. Shen A New Golden Age of Natural Products Drug Discovery , 2015, Cell.

[54]  L. Qiao,et al.  Direct MALDI-TOF MS Identification of Bacterial Mixtures. , 2018, Analytical chemistry.

[55]  Robert R. Sokal,et al.  Distance as a Measure of Taxonomic Similarity , 1961 .

[56]  Jonathan Bisson,et al.  Pharmacognosy in the digital era: shifting to contextualized metabolomics. , 2018, Current opinion in biotechnology.

[57]  Stefan Grimme,et al.  How to Compute Electron Ionization Mass Spectra from First Principles. , 2016, The journal of physical chemistry. A.

[58]  Masanori Arita,et al.  Identification of small molecules using accurate mass MS/MS search. , 2018, Mass spectrometry reviews.

[59]  C. Boddy,et al.  Natural products: Mapping an amazing thicket. , 2016, Nature chemical biology.

[60]  E. Pichersky,et al.  Genetics and biochemistry of secondary metabolites in plants: an evolutionary perspective. , 2000, Trends in plant science.

[61]  Hongmei Lu,et al.  Deep MS/MS-Aided Structural-Similarity Scoring for Unknown Metabolite Identification. , 2019, Analytical chemistry.

[62]  C. Zidorn Plant chemophenetics - A new term for plant chemosystematics/plant chemotaxonomy in the macro-molecular era. , 2019, Phytochemistry.

[63]  Zsuzsanna Lipták,et al.  SIRIUS: decomposing isotope patterns for metabolite identification , 2008, Bioinform..

[64]  Michel Leboeuf,et al.  Aporphine Alkaloids. II , 1979 .

[65]  Shoei-Sheng Lee,et al.  Synthesis of (.+-.)-Glaucine and (.+-.)-Neospirodienone via a One-Pot Bischler—Napieralski Reaction and Oxidative Coupling by a Hypervalent Iodine Reagent. , 2004 .

[66]  Kai Blin,et al.  antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences , 2011, Nucleic Acids Res..

[67]  Joshua J. Kellogg,et al.  Opportunities and Limitations for Untargeted Mass Spectrometry Metabolomics to Identify Biologically Active Constituents in Complex Natural Product Mixtures. , 2019, Journal of natural products.

[68]  Joe Wandy,et al.  Unsupervised Discovery and Comparison of Structural Families Across Multiple Samples in Untargeted Metabolomics , 2017, Analytical chemistry.

[69]  S. Brady,et al.  Natural products from environmental DNA hosted in Ralstonia metallidurans. , 2009, ACS chemical biology.

[70]  Lisa Drew,et al.  Are We Losing the Science of Taxonomy? , 2011 .

[71]  A E Brunetti,et al.  An integrative omics perspective for the analysis of chemical signals in ecological interactions. , 2018, Chemical Society reviews.

[72]  Anthony R Carroll,et al.  Database for Rapid Dereplication of Known Natural Products Using Data from MS and Fast NMR Experiments. , 2017, Journal of natural products.

[73]  Hans-Peter Weikard,et al.  Diversity measurement combining relative abundances and taxonomic distinctiveness of species , 2006 .

[74]  Tobias Depke,et al.  CluMSID: an R package for similarity-based clustering of tandem mass spectra to aid feature annotation in metabolomics , 2019, Bioinform..

[75]  Mohammad Alanjary,et al.  Computer-aided re-engineering of nonribosomal peptide and polyketide biosynthetic assembly lines. , 2019, Natural product reports.

[76]  Yutaka Yamada,et al.  A cheminformatics approach to characterize metabolomes in stable-isotope-labeled organisms , 2019, Nature Methods.

[77]  Jean-Luc Wolfender,et al.  Deep metabolome annotation in natural products research: towards a virtuous cycle in metabolite identification. , 2017, Current opinion in chemical biology.

[78]  L Mark Hall,et al.  Evaluation of an Artificial Neural Network Retention Index Model for Chemical Structure Identification in Nontargeted Metabolomics. , 2018, Analytical chemistry.

[79]  S. Böcker,et al.  Searching molecular structure databases with tandem mass spectra using CSI:FingerID , 2015, Proceedings of the National Academy of Sciences of the United States of America.

[80]  N. Lindquist,et al.  Constraints on Chemically Mediated Coevolution: Multiple Functions for Seaweed Secondary Metabolites , 1995 .