A Roadmap for Natural Product Discovery Based on Large-Scale Genomics and Metabolomics

Actinobacteria encode a wealth of natural product biosynthetic gene clusters (NPGCs), whose systematic study is complicated by numerous repetitive motifs. By combining several metrics we developed a method for global classification of these gene clusters into families (GCFs) and analyzed the biosynthetic capacity of Actinobacteria in 830 genome sequences, including 344 obtained for this project. The GCF network, comprised of 11,422 gene clusters grouped into 4,122 GCFs, was validated in hundreds of strains by correlating confident mass spectrometric detection of known small molecules with the presence/absence of their established biosynthetic gene clusters. The method also linked previously unassigned GCFs to known natural products, an approach that will enable de novo, bioassay-free discovery of novel natural products using large data sets. Extrapolation from the 830-genome dataset reveals that Actinobacteria encode hundreds of thousands of future drug leads, while the strong correlation between phylogeny and GCFs frames a roadmap to efficiently access them.

[1]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[2]  Dannie Durand,et al.  Gene Cluster Statistics with Gene Families , 2009, Molecular biology and evolution.

[3]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[4]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[5]  C. Walsh,et al.  Identification of the biosynthetic gene cluster for the pacidamycin group of peptidyl nucleoside antibiotics , 2010, Proceedings of the National Academy of Sciences.

[6]  W. Metcalf,et al.  Comparative genomics of actinomycetes with a focus on natural product biosynthetic genes , 2013, BMC Genomics.

[7]  Pieter C. Dorrestein,et al.  A mass spectrometry-guided genome mining approach for natural product peptidogenomics , 2011, Nature chemical biology.

[8]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[9]  Gitanjali Yadav,et al.  SEARCHPKS: a program for detection and analysis of polyketide synthase domains , 2003, Nucleic Acids Res..

[10]  R. Beiko,et al.  Comparative Genomic and Phylogenetic Approaches to Characterize the Role of Genetic Recombination in Mycobacterial Evolution , 2012, PloS one.

[11]  Miriam L. Land,et al.  Trace: Tennessee Research and Creative Exchange Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification Recommended Citation Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification , 2022 .

[12]  R. Müller,et al.  Melithiazol biosynthesis: further insights into myxobacterial PKS/NRPS systems and evidence for a new subclass of methyl transferases. , 2003, Chemistry & biology.

[13]  Ying Huang,et al.  Taxonomic evaluation of the Streptomyces griseus clade using multilocus sequence analysis and DNA-DNA hybridization, with proposal to combine 29 species and three subspecies as 11 genomic species. , 2010, International journal of systematic and evolutionary microbiology.

[14]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[15]  Sean F. Brady,et al.  Chemical-biogeographic survey of secondary metabolism in soil , 2014, Proceedings of the National Academy of Sciences.

[16]  Mark J. P. Chaisson,et al.  De novo fragment assembly with short mate-paired reads: Does the read length matter? , 2009, Genome research.

[17]  William Fenical,et al.  Species-Specific Secondary Metabolite Production in Marine Actinomycetes of the Genus Salinispora , 2006, Applied and Environmental Microbiology.

[18]  Yi-Zun Yu,et al.  Evolution of lanthipeptide synthetases , 2012, Proceedings of the National Academy of Sciences.

[19]  Nuno Bandeira,et al.  MS/MS networking guided analysis of molecule and gene cluster families , 2013, Proceedings of the National Academy of Sciences.

[20]  Nobuyuki Fujita,et al.  DoBISCUIT: a database of secondary metabolite biosynthetic gene clusters , 2012, Nucleic Acids Res..

[21]  Sylvie Lautru,et al.  Discovery of a new peptide natural product by Streptomyces coelicolor genome mining , 2005, Nature chemical biology.

[22]  J. Bérdy Thoughts and facts about antibiotics: Where we are now and where we are heading , 2012, The Journal of Antibiotics.

[23]  Robert K. Colwell,et al.  Models and estimators linking individual-based and sample-based rarefaction, extrapolation and comparison of assemblages , 2012 .

[24]  John Bunge,et al.  Estimating the Number of Species in Microbial Diversity Studies , 2014 .

[25]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[26]  Nan Liu,et al.  A simple reverse genetics approach to elucidating the biosynthetic pathway of nocathiacin , 2011, Biotechnology Letters.

[27]  János Bérdy,et al.  Bioactive microbial metabolites. , 2005, The Journal of antibiotics.

[28]  Hui Hong,et al.  Insights into polyether biosynthesis from analysis of the nigericin biosynthetic gene cluster in Streptomyces sp. DSM4137. , 2007, Chemistry & biology.

[29]  Kai Blin,et al.  antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences , 2011, Nucleic Acids Res..

[30]  T. Hansen Bergey's Manual of Systematic Bacteriology , 2005 .

[31]  M. Fischbach,et al.  Assembly-line enzymology for polyketide and nonribosomal Peptide antibiotics: logic, machinery, and mechanisms. , 2006, Chemical reviews.

[32]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[33]  Kiejung Park,et al.  ASMPKS: an analysis system for modular polyketide synthases , 2007, BMC Bioinformatics.

[34]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[35]  Kyle R. Conway,et al.  ClusterMine360: a database of microbial PKS/NRPS biosynthesis , 2012, Nucleic Acids Res..

[36]  J. Badger,et al.  The Natural Product Domain Seeker NaPDoS: A Phylogeny Based Bioinformatic Tool to Classify Secondary Metabolite Gene Diversity , 2012, PloS one.

[37]  Huanming Yang,et al.  De novo assembly of human genomes with massively parallel short read sequencing. , 2010, Genome research.

[38]  G. Klinkenberg,et al.  Insights into the Evolution of Macrolactam Biosynthesis through Cloning and Comparative Analysis of the Biosynthetic Gene Cluster for a Novel Macrocyclic Lactam, ML-449 , 2009, Applied and Environmental Microbiology.

[39]  Ying Huang,et al.  Proposal to reclassify the Streptomyces albidoflavus clade on the basis of multilocus sequence analysis and DNA-DNA hybridization, and taxonomic elucidation of Streptomyces griseus subsp. solvifaciens. , 2009, Systematic and applied microbiology.

[40]  R. Reid,et al.  Analysis of the ambruticin and jerangolid gene clusters of Sorangium cellulosum reveals unusual mechanisms of polyketide biosynthesis. , 2006, Chemistry & biology.

[41]  Joaquín Dopazo,et al.  ETE: a python Environment for Tree Exploration , 2010, BMC Bioinformatics.

[42]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[43]  H. Lechevalier,et al.  Selective isolation of aerobic Actinomycetes. , 1963, Applied microbiology.

[44]  J. Zucko,et al.  Databases of the thiotemplate modular systems (CSDB) and their in silico recombinants (r-CSDB) , 2013, Journal of Industrial Microbiology & Biotechnology.

[45]  B. Barrell,et al.  Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2) , 2002, Nature.

[46]  Z. Deng,et al.  Identification and characterization of the actinomycin G gene cluster in Streptomyces iakyrus. , 2013, Molecular bioSystems.

[47]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[48]  Gwan-Su Yi,et al.  PKMiner: a database for exploring type II polyketide synthases , 2012, BMC Microbiology.

[49]  C. Tseng,et al.  Cloning and characterization of monacolin K biosynthetic gene cluster from Monascus pilosus. , 2008, Journal of agricultural and food chemistry.

[50]  Siu-Ming Yiu,et al.  IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth , 2012, Bioinform..

[51]  D. Mitchell,et al.  YcaO domains utilize ATP to activate amide backbones during peptide cyclodehydrations , 2012, Nature chemical biology.

[52]  G. Garrity Bergey’s Manual® of Systematic Bacteriology , 2012, Springer New York.

[53]  Eugene V. Koonin,et al.  Phylogenomics of Prokaryotic Ribosomal Proteins , 2012, PloS one.

[54]  B. Laber,et al.  Inactivation of Escherichia coli threonine synthase by dl-Z-2-amino-5-phosphono-3-pentenoic acid , 1994, Archives of Microbiology.

[55]  John M. Walker,et al.  Natural Products Isolation , 2005, Methods in Biotechnology.

[56]  T. Ohnuki,et al.  A-503083 A, B, E and F, novel inhibitors of bacterial translocase I, produced by Streptomyces sp. SANK 62799. , 2004, The Journal of antibiotics.

[57]  T. Eguchi,et al.  Cloning of the biosynthetic gene cluster for naphthoxanthene antibiotic FD-594 from Streptomyces sp. TA-0256 , 2011, The Journal of Antibiotics.

[58]  Walter Pirovano,et al.  BIOINFORMATICS APPLICATIONS , 2022 .

[59]  Gregory Kucherov,et al.  NORINE: a database of nonribosomal peptides , 2007, Nucleic Acids Res..

[60]  J. Rohr,et al.  Cloning and Characterization of the Ravidomycin and Chrysomycin Biosynthetic Gene Clusters , 2010, Chembiochem : a European journal of chemical biology.

[61]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[62]  H. Nishimura Siomycin,a new thiostrepton-like antibiotic. , 1961 .

[63]  Ying Huang,et al.  Taxonomic evaluation of the Streptomyces hygroscopicus clade using multilocus sequence analysis and DNA-DNA hybridization, validating the MLSA scheme for systematics of the whole genus. , 2012, Systematic and applied microbiology.

[64]  Radhey S. Gupta,et al.  Phylogenetic Framework and Molecular Signatures for the Main Clades of the Phylum Actinobacteria , 2012, Microbiology and Molecular Reviews.

[65]  N. Kelleher,et al.  Discovery of the antibiotic phosacetamycin via a new mass spectrometry-based method for phosphonic acid detection. , 2013, ACS chemical biology.