Focused natural product elucidation by prioritizing high-throughput metabolomic studies with machine learning

Bacteria of the genera Photorhabdus and Xenorhabdus produce a plethora of natural products to support their similar symbiotic lifecycles. For many of these compounds, the specific bioactivities are unknown. One common challenge in natural product research when trying to prioritize research efforts is the rediscovery of identical (or highly similar) compounds from different strains. Linking genome sequence to metabolite production can help in overcoming this problem. However, sequences are typically not available for entire collections of organisms. Here we perform a comprehensive metabolic screening using HPLC-MS data associated with a 114-strain collection (58 Photorhabdus and 56 Xenorhabdus) from across Thailand and explore the metabolic variation among the strains, matched with several abiotic factors. We utilize machine learning in order to rank the importance of individual metabolites in determining all given metadata. With this approach, we were able to prioritize metabolites in the context of natural product investigations, leading to the identification of previously unknown compounds. The top three highest-ranking features were associated with Xenorhabdus and attributed to the same chemical entity, cyclo(tetrahydroxybutyrate). This work addresses the need for prioritization in high-throughput metabolomic studies and demonstrates the viability of such an approach in future research.

[1]  H. Bode,et al.  Genome comparisons provide insights into the role of secondary metabolites in the pathogenic phase of the Photorhabdus life cycle , 2016, BMC Genomics.

[2]  A Mayr,et al.  The Evolution of Boosting Algorithms , 2014, Methods of Information in Medicine.

[3]  H. Bode,et al.  Entomopathogenic bacteria use multiple mechanisms for bioactive peptide library design. , 2017, Nature chemistry.

[4]  C. Lemetre,et al.  Bacterial natural product biosynthetic domain composition in soil correlates with changes in latitude on a continent-wide scale , 2017, Proceedings of the National Academy of Sciences.

[5]  R. Ehlers,et al.  Pathogenicity, development, and reproduction of Heterorhabditis bacteriophora and Steinernema carpocapsae under axenic in vivo conditions. , 2000, Journal of invertebrate pathology.

[6]  Tie-Yan Liu,et al.  A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS 2017.

[7]  Rolf Müller,et al.  Correlating chemical diversity with taxonomic distance for discovery of natural products in myxobacteria , 2018, Nature Communications.

[8]  D. Seebach,et al.  Solid-state CP/MAS13C-NMR spectra of oligolides derived from 3-hydroxybutanoic acid , 1993 .

[9]  Matej Oresic,et al.  MZmine: toolbox for processing and visualization of mass spectrometry based molecular profile data , 2006, Bioinform..

[10]  R. Akhurst,et al.  Biochemical and Physiological Characterization of Colony Form Variants in Xenorhabdus spp. (Enterobacteriaceae) , 1988 .

[11]  D. Seebach,et al.  Cyclische Oligomere von (R)-3-Hydroxybuttersäure: Herstellung und strukturelle Aspekte† , 1993 .

[12]  H. Bode,et al.  Chemical language and warfare of bacterial natural products in bacteria-nematode-insect interactions. , 2018, Natural product reports.

[13]  Ingo Ebersberger,et al.  Natural product diversity associated with the nematode symbionts Photorhabdus and Xenorhabdus , 2017, Nature Microbiology.

[14]  Zsuzsanna Lipták,et al.  SIRIUS: decomposing isotope patterns for metabolite identification , 2008, Bioinform..

[15]  Roy D. Welch,et al.  The Entomopathogenic Bacterial Endosymbionts Xenorhabdus and Photorhabdus: Convergent Lifestyles from Divergent Genomes , 2011, PloS one.

[16]  S. Nadler,et al.  Phylogeny of Steinernema Travassos, 1927 (Cephalobina: Steinernematidae) Inferred From Ribosomal DNA Sequences and Morphological Characters , 2001, The Journal of parasitology.

[17]  Ernö Pretsch,et al.  Carrier-Based Ion-Selective Electrodes and Bulk Optodes. 1. General Characteristics. , 1997, Chemical reviews.

[18]  Scott M. Lundberg,et al.  Consistent Individualized Feature Attribution for Tree Ensembles , 2018, ArXiv.

[19]  M. Nei,et al.  MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. , 2011, Molecular biology and evolution.

[20]  Julian Parkhill,et al.  Comparative genomics of the emerging human pathogen Photorhabdus asymbiotica with the insect pathogen Photorhabdus luminescens , 2009, BMC Genomics.

[21]  Matej Oresic,et al.  MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data , 2010, BMC Bioinformatics.

[22]  Kristian Fog Nielsen,et al.  Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking , 2016, Nature Biotechnology.

[23]  H. Bode,et al.  Refining the Natural Product Repertoire in Entomopathogenic Bacteria. , 2018, Trends in microbiology.

[24]  H. Bode,et al.  Natural Product Diversification Mediated by Alternative Transcriptional Starting. , 2018, Angewandte Chemie.

[25]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[26]  R. Akhurst Morphological and Functional Dimorphism in Xenorhabdus spp., Bacteria Symbiotically Associated with the Insect Pathogenic Nematodes Neoaplectana and Heterorhabditis , 1980 .

[27]  S. Stock Steinernema siamkayai n. sp. (Rhabditida: Steinernematidae), an entomopathogenic nematode from Thailand , 1998, Systematic Parasitology.

[28]  H. Bode,et al.  Dual phenazine gene clusters enable diversification during biosynthesis , 2019, Nature Chemical Biology.

[29]  Liu Cao,et al.  Dereplication of microbial metabolites through database search of mass spectra , 2018, Nature Communications.

[30]  H. Goodrich-Blair,et al.  Comparison of Xenorhabdus bovienii bacterial strain genomes reveals diversity in symbiotic functions , 2015, BMC Genomics.

[31]  M. Nei,et al.  Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. , 1993, Molecular biology and evolution.

[32]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[33]  이화영 X , 1960, Chinese Plants Names Index 2000-2009.

[34]  E. Stackebrandt,et al.  Xenorhabdus and Photorhabdus spp.: bugs that kill bugs. , 1997, Annual review of microbiology.

[35]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[36]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[37]  Frank Wesche,et al.  De novo design and engineering of non-ribosomal peptide synthetases. , 2018, Nature chemistry.

[38]  Didrik Nielsen,et al.  Tree Boosting With XGBoost - Why Does XGBoost Win "Every" Machine Learning Competition? , 2016 .