A computational framework for systematic exploration of biosynthetic diversity from large-scale genomic data

Genome mining has become a key technology to explore and exploit natural product diversity through the identification and analysis of biosynthetic gene clusters (BGCs). Initially, this was performed on a single-genome basis; currently, the process is being scaled up to large-scale mining of pan-genomes of entire genera, complete strain collections and metagenomic datasets from which thousands of bacterial genomes can be extracted at once. However, no bioinformatic framework is currently available for the effective analysis of datasets of this size and complexity. Here, we provide a streamlined computational workflow, tightly integrated with antiSMASH and MIBiG, that consists of two new software tools, BiG-SCAPE and CORASON. BiG-SCAPE facilitates rapid calculation and interactive visual exploration of BGC sequence similarity networks, grouping gene clusters at multiple hierarchical levels, and includes a ‘glocal’ alignment mode that accurately groups both complete and fragmented BGCs. CORASON employs a phylogenomic approach to elucidate the detailed evolutionary relationships between gene clusters by computing high-resolution multi-locus phylogenies of all BGCs within and across gene cluster families (GCFs), and allows researchers to comprehensively identify all genomic contexts in which particular biosynthetic gene cassettes are found. We validate BiG-SCAPE by correlating its GCF output to metabolomic data across 403 actinobacterial strains. Furthermore, we demonstrate the discovery potential of the platform by using CORASON to comprehensively map the phylogenetic diversity of the large detoxin/rimosamide gene cluster clan, prioritizing three new detoxin families for subsequent characterization of six new analogs using isotopic labeling and analysis of tandem mass spectrometric data.

[1]  Wei Qian,et al.  Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. , 2000, Molecular biology and evolution.

[2]  B. Barrell,et al.  Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2) , 2002, Nature.

[3]  Carla S. Jones,et al.  Minimum Information about a Biosynthetic Gene cluster. , 2015, Nature chemical biology.

[4]  Chad W. Johnston,et al.  Polyketide and nonribosomal peptide retro-biosynthesis and global gene cluster matching. , 2016, Nature chemical biology.

[5]  R. Kolter,et al.  Natural products in soil microbe interactions and evolution. , 2015, Natural product reports.

[6]  Michael A Fischbach,et al.  Computational approaches to natural product discovery. , 2015, Nature chemical biology.

[7]  Jörn Piel,et al.  Metagenome Mining Reveals Polytheonamides as Posttranslationally Modified Ribosomal Peptides , 2012, Science.

[8]  Michael A. Skinnider,et al.  Genomes to natural products PRediction Informatics for Secondary Metabolomes (PRISM) , 2015, Nucleic acids research.

[9]  Tjerk P. Straatsma,et al.  NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations , 2010, Comput. Phys. Commun..

[10]  Kai Blin,et al.  antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters , 2015, Nucleic Acids Res..

[11]  Lei Zhu,et al.  An initial strategy for comparing proteins at the domain architecture level , 2006, Bioinform..

[12]  William H. Gerwick,et al.  Retrospective analysis of natural products provides insights for future discovery trends , 2017, Proceedings of the National Academy of Sciences.

[13]  Nuno Bandeira,et al.  MS/MS networking guided analysis of molecule and gene cluster families , 2013, Proceedings of the National Academy of Sciences.

[14]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[15]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[16]  Kai Blin,et al.  antiSMASH 4.0—improvements in chemistry prediction and gene cluster boundary identification , 2017, Nucleic Acids Res..

[17]  Ingo Ebersberger,et al.  Natural product diversity associated with the nematode symbionts Photorhabdus and Xenorhabdus , 2017, Nature Microbiology.

[18]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[19]  Neil L Kelleher,et al.  A Roadmap for Natural Product Discovery Based on Large-Scale Genomics and Metabolomics , 2014, Nature chemical biology.

[20]  Kristian Fog Nielsen,et al.  Global analysis of biosynthetic gene clusters reveals vast potential of secondary metabolite production in Penicillium species , 2017, Nature Microbiology.

[21]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[22]  Roy D. Welch,et al.  Complete genome sequence of the myxobacterium Sorangium cellulosum , 2007, Nature Biotechnology.

[23]  F. Barona-Gómez,et al.  Phylogenomic Analysis of Natural Products Biosynthetic Gene Clusters Allows Discovery of Arseno-Organic Metabolites in Model Streptomycetes , 2016, bioRxiv.

[24]  T. Leisinger,et al.  Characterization of α-Ketoglutarate-dependent Taurine Dioxygenase from Escherichia coli * , 1997, The Journal of Biological Chemistry.

[25]  Kai Blin,et al.  antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences , 2011, Nucleic Acids Res..

[26]  Andrej Sali,et al.  A Systematic Computational Analysis of Biosynthetic Gene Cluster Evolution: Lessons for Engineering Biosynthesis , 2014, PLoS Comput. Biol..

[27]  Corinna Lange,et al.  Genomics-driven discovery of PKS-NRPS hybrid metabolites from Aspergillus nidulans. , 2007, Nature chemical biology.

[28]  Paula Y. Calle,et al.  Multiplexed metagenome mining using short DNA sequence tags facilitates targeted discovery of epoxyketone proteasome inhibitors , 2015, Proceedings of the National Academy of Sciences.

[29]  Michael A. Skinnider,et al.  PRISM 3: expanded prediction of natural product chemical structures from microbial genomes , 2017, Nucleic Acids Res..

[30]  Michael A. Skinnider,et al.  An automated Genomes-to-Natural Products platform (GNP) for the discovery of modular natural products , 2015, Nature Communications.

[31]  Vinayak Agarwal,et al.  Metagenomic discovery of polybrominated diphenyl ether biosynthesis by marine sponges , 2017, Nature chemical biology.

[32]  Elizabeth A. Shank,et al.  Large-Scale Bioinformatics Analysis of Bacillus Genomes Uncovers Conserved Roles of Natural Products in Bacterial Physiology , 2017, mSystems.

[33]  Ryan A McClure,et al.  Metabologenomics: Correlation of Microbial Gene Clusters with Metabolites Drives Discovery of a Nonribosomal Peptide with an Unusual Amino Acid Monomer , 2016, ACS central science.

[34]  Roger G. Linington,et al.  Insights into Secondary Metabolism from a Global Analysis of Prokaryotic Biosynthetic Gene Clusters , 2014, Cell.

[35]  Christopher T. Walsh,et al.  The evolution of gene collectives: How natural selection drives chemical innovation , 2008, Proceedings of the National Academy of Sciences.

[36]  Donovan H. Parks,et al.  Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life , 2017, Nature Microbiology.

[37]  Ryan A McClure,et al.  Discovery of the Tyrobetaine Natural Products and Their Biosynthetic Gene Cluster via Metabologenomics. , 2018, ACS chemical biology.

[38]  Krystle L. Chavarria,et al.  Diversity and evolution of secondary metabolism in the marine actinomycete genus Salinispora , 2014, Proceedings of the National Academy of Sciences.

[39]  Robert P. Hausinger,et al.  Fe(II)/α-Ketoglutarate-Dependent Hydroxylases and Related Enzymes , 2004 .

[40]  Kai Blin,et al.  antiSMASH 2.0—a versatile platform for genome mining of secondary metabolite producers , 2013, Nucleic Acids Res..

[41]  J. Davies,et al.  Specialized microbial metabolites: functions and origins , 2013, The Journal of Antibiotics.

[42]  Ryan A McClure,et al.  Elucidating the Rimosamide-Detoxin Natural Product Families and Their Biosynthesis Using Metabolite/Gene Cluster Correlations. , 2016, ACS chemical biology.

[43]  Oliver Kohlbacher,et al.  SANDPUMA: ensemble predictions of nonribosomal peptide chemistry reveal biosynthetic diversity across Actinobacteria , 2017, Bioinform..

[44]  Evgeny M. Zdobnov,et al.  The Newick utilities: high-throughput phylogenetic tree processing in the Unix shell , 2010, Bioinform..

[45]  Adam P. Arkin,et al.  FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix , 2009, Molecular biology and evolution.