Taxonomically Informed Scoring Enhances Confidence in Natural Products Annotation

Mass spectrometry (MS) offers unrivalled sensitivity for the metabolite profiling of complex biological matrices encountered in natural products (NP) research. The massive and complex sets of spectral data generated by such platforms require computational approaches for their interpretation. Within such approaches, computational metabolite annotation automatically links spectral data to candidate structures via a score, which is usually established between the acquired data and experimental or theoretical spectral databases (DB). This process leads to various candidate structures for each MS features. However, at this stage, obtaining high annotation confidence level remains a challenge notably due to the extensive chemodiversity of specialized metabolomes. The design of a metascore is a way to capture complementary experimental attributes and improve the annotation process. Here, we show that integrating the taxonomic position of the biological source of the analyzed samples and candidate structures enhances confidence in metabolite annotation. A script is proposed to automatically input such information at various granularity levels (species, genus, and family) and complement the score obtained between experimental spectral data and output of available computational metabolite annotation tools (ISDB-DNP, MS-Finder, Sirius). In all cases, the consideration of the taxonomic distance allowed an efficient re-ranking of the candidate structures leading to a systematic enhancement of the recall and precision rates of the tools (1.5- to 7-fold increase in the F1 score). Our results clearly demonstrate the importance of considering taxonomic information in the process of specialized metabolites annotation. This requires to access structural data systematically documented with biological origin, both for new and previously reported NPs. In this respect, the establishment of an open structural DB of specialized metabolites and their associated metadata, particularly biological sources, is timely and critical for the NP research community.

[1]  M. Ajmal Ali,et al.  India needs more plant taxonomists , 2011, Nature.

[2]  Emma L. Schymanski,et al.  MetFrag relaunched: incorporating strategies beyond in silico fragmentation , 2016, Journal of Cheminformatics.

[3]  Jonathan Bisson,et al.  Integration of Molecular Networking and In-Silico MS/MS Fragmentation for Natural Products Dereplication. , 2016, Analytical chemistry.

[4]  Juho Rousu,et al.  SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information , 2019, Nature Methods.

[5]  B. Shen A New Golden Age of Natural Products Drug Discovery , 2015, Cell.

[6]  Robert R Junker,et al.  A biosynthetically informed distance measure to compare secondary metabolite profiles , 2017, Chemoecology.

[7]  Stefan Grimme,et al.  How to Compute Electron Ionization Mass Spectra from First Principles. , 2016, The journal of physical chemistry. A.

[8]  Masanori Arita,et al.  Identification of small molecules using accurate mass MS/MS search. , 2018, Mass spectrometry reviews.

[9]  Russ Greiner,et al.  Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification , 2013, Metabolomics.

[10]  Shoei-Sheng Lee,et al.  Synthesis of (.+-.)-Glaucine and (.+-.)-Neospirodienone via a One-Pot Bischler—Napieralski Reaction and Oxidative Coupling by a Hypervalent Iodine Reagent. , 2004 .

[11]  Jean-Marc Nuzillard,et al.  Accelerating Metabolite Identification in Natural Product Research: Toward an Ideal Combination of Liquid Chromatography-High-Resolution Tandem Mass Spectrometry and NMR Profiling, in Silico Databases, and Chemometrics. , 2018, Analytical chemistry.

[12]  Zsuzsanna Lipták,et al.  SIRIUS: decomposing isotope patterns for metabolite identification , 2008, Bioinform..

[13]  A E Brunetti,et al.  An integrative omics perspective for the analysis of chemical signals in ecological interactions. , 2018, Chemical Society reviews.

[14]  Yutaka Yamada,et al.  A cheminformatics approach to characterize metabolomes in stable-isotope-labeled organisms , 2019, Nature Methods.

[15]  Jean-Luc Wolfender,et al.  Deep metabolome annotation in natural products research: towards a virtuous cycle in metabolite identification. , 2017, Current opinion in chemical biology.

[16]  Serge Rudaz,et al.  Method transfer for fast liquid chromatography in pharmaceutical analysis: application to short columns packed with small particle. Part I: isocratic separation. , 2007, European journal of pharmaceutics and biopharmaceutics : official journal of Arbeitsgemeinschaft fur Pharmazeutische Verfahrenstechnik e.V.

[17]  Hans-Peter Weikard,et al.  Diversity measurement combining relative abundances and taxonomic distinctiveness of species , 2006 .

[18]  Kristian Fog Nielsen,et al.  Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking , 2016, Nature Biotechnology.

[19]  Natalie I. Tasman,et al.  A Cross-platform Toolkit for Mass Spectrometry and Proteomics , 2012, Nature Biotechnology.

[20]  Lisa Drew,et al.  Are We Losing the Science of Taxonomy? , 2011 .

[21]  Susana P. Gaudêncio,et al.  Dereplication: racing to speed up the natural products discovery process. , 2015, Natural product reports.

[22]  Oliver Fiehn,et al.  Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry , 2007, BMC Bioinformatics.

[23]  Matej Oresic,et al.  MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data , 2010, BMC Bioinformatics.

[24]  Fang,et al.  Stereodivergent Synthesis of 2,3-Disubstituted 1,4-Dicarbonyls , 2018, Synfacts.

[25]  Azian Azamimi Abdullah,et al.  Novel Approach to Classify Plants Based on Metabolite-Content Similarity , 2017, BioMed research international.

[26]  Rolf Müller,et al.  Correlating chemical diversity with taxonomic distance for discovery of natural products in myxobacteria , 2018, Nature Communications.

[27]  Eran Pichersky,et al.  Convergent evolution in plant specialized metabolism. , 2011, Annual review of plant biology.

[28]  Nuno Bandeira,et al.  Spectral Library Generating Function for Assessing Spectrum-Spectrum Match Significance , 2013, RECOMB.

[29]  Justin J. J. van der Hooft,et al.  Assessing Specialized Metabolite Diversity in the Cosmopolitan Plant Genus Euphorbia L. , 2019, Front. Plant Sci..

[30]  Juho Rousu,et al.  Liquid‐chromatography retention order prediction for metabolite identification , 2018, Bioinform..

[31]  Madeleine Ernst,et al.  Comprehensive mass spectrometry-guided phenotyping of plant specialized metabolites reveals metabolic diversity in the cosmopolitan plant family Rhamnaceae. , 2019, The Plant journal : for cell and molecular biology.

[32]  David S. Wishart,et al.  CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra , 2014, Nucleic Acids Res..

[33]  Michel Leboeuf,et al.  Aporphine Alkaloids. II , 1979 .

[34]  Serge Rudaz,et al.  Method transfer for fast liquid chromatography in pharmaceutical analysis: application to short columns packed with small particle. Part II: gradient experiments. , 2008, European journal of pharmaceutics and biopharmaceutics : official journal of Arbeitsgemeinschaft fur Pharmazeutische Verfahrenstechnik e.V.

[35]  K. R. Clarke,et al.  A taxonomic distinctness index and its statistical properties , 1998 .

[36]  Kazuki Saito,et al.  Hydrogen Rearrangement Rules: Computational MS/MS Fragmentation and Structure Elucidation Using MS-FINDER Software. , 2016, Analytical chemistry.

[37]  H Guinaudeau,et al.  Aporphine alkaloids. , 1975, Lloydia.

[38]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[39]  Mingxun Wang,et al.  Propagating annotations of molecular networks using in silico fragmentation , 2018, PLoS Comput. Biol..

[40]  Carin Li,et al.  CFM-ID 3.0: Significantly Improved ESI-MS/MS Prediction and Compound Identification , 2019, Metabolites.