Recent development of antiSMASH and other computational approaches to mine secondary metabolite biosynthetic gene clusters

Abstract Many drugs are derived from small molecules produced by microorganisms and plants, so-called natural products. Natural products have diverse chemical structures, but the biosynthetic pathways producing those compounds are often organized as biosynthetic gene clusters (BGCs) and follow a highly conserved biosynthetic logic. This allows for the identification of core biosynthetic enzymes using genome mining strategies that are based on the sequence similarity of the involved enzymes/genes. However, mining for a variety of BGCs quickly approaches a complexity level where manual analyses are no longer possible and require the use of automated genome mining pipelines, such as the antiSMASH software. In this review, we discuss the principles underlying the predictions of antiSMASH and other tools and provide practical advice for their application. Furthermore, we discuss important caveats such as rule-based BGC detection, sequence and annotation quality and cluster boundary prediction, which all have to be considered while planning for, performing and analyzing the results of genome mining studies.

[1]  Oliver Kohlbacher,et al.  SANDPUMA: ensemble predictions of nonribosomal peptide chemistry reveal biosynthetic diversity across Actinobacteria , 2017, Bioinform..

[2]  Kai Blin,et al.  antiSMASH 4.0—improvements in chemistry prediction and gene cluster boundary identification , 2017, Nucleic Acids Res..

[3]  Carla S. Jones,et al.  Minimum Information about a Biosynthetic Gene cluster. , 2015, Nature chemical biology.

[4]  Erin Beck,et al.  TIGRFAMs and Genome Properties in 2013 , 2012, Nucleic Acids Res..

[5]  Gregory L. Challis,et al.  clusterTools: proximity searches for functional elements to identify putative biosynthetic gene clusters , 2017, bioRxiv.

[6]  F. Barona-Gómez,et al.  Phylogenomic Analysis of Natural Products Biosynthetic Gene Clusters Allows Discovery of Arseno-Organic Metabolites in Model Streptomycetes , 2016, bioRxiv.

[7]  Susana P. Gaudêncio,et al.  Genomic insights into specialized metabolism in the marine actinomycete Salinispora , 2017, Environmental microbiology.

[8]  Michael A Fischbach,et al.  Computational approaches to natural product discovery. , 2015, Nature chemical biology.

[9]  Christopher J. Schwalen,et al.  A new genome-mining tool redefines the lasso peptide biosynthetic landscape , 2016, Nature chemical biology.

[10]  Renzo Kottmann,et al.  The antiSMASH database, a comprehensive database of microbial secondary metabolite biosynthetic gene clusters , 2016, Nucleic Acids Res..

[11]  Martina Adamek,et al.  Mining Bacterial Genomes for Secondary Metabolite Gene Clusters. , 2017, Methods in molecular biology.

[12]  Michael A. Skinnider,et al.  PRISM 3: expanded prediction of natural product chemical structures from microbial genomes , 2017, Nucleic Acids Res..

[13]  Priya Gupta,et al.  SBSPKSv2: structure-based sequence analysis of polyketide synthases and non-ribosomal peptide synthetases , 2017, Nucleic Acids Res..

[14]  Tilmann Weber,et al.  Bioinformatics Tools for the Discovery of New Nonribosomal Peptides. , 2016, Methods in molecular biology.

[15]  Huimin Zhao,et al.  High-Efficiency Multiplex Genome Editing of Streptomyces Species Using an Engineered CRISPR/Cas System , 2014, ACS synthetic biology.

[16]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[17]  Arvind K. Chavali,et al.  Bioinformatics tools for the identification of gene clusters that biosynthesize specialized metabolites , 2018, Briefings Bioinform..

[18]  Michael A. Skinnider,et al.  Genomic charting of ribosomally synthesized natural product chemical space facilitates targeted mining , 2016, Proceedings of the National Academy of Sciences.

[19]  Shawn French,et al.  Assembly and clustering of natural antibiotics guides target identification. , 2016, Nature chemical biology.

[20]  Michael A. Skinnider,et al.  An automated Genomes-to-Natural Products platform (GNP) for the discovery of modular natural products , 2015, Nature Communications.

[21]  Anne Osbourn,et al.  Computational genomic identification and functional reconstitution of plant natural product biosynthetic pathways , 2016, Natural product reports.

[22]  Kai Blin,et al.  antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters , 2015, Nucleic Acids Res..

[23]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[24]  R. Breitling,et al.  Detecting Sequence Homology at the Gene Cluster Level with MultiGeneBlast , 2013, Molecular biology and evolution.

[25]  David J Newman,et al.  Natural products as sources of new drugs over the 30 years from 1981 to 2010. , 2012, Journal of natural products.

[26]  D. Haft,et al.  SMURF: Genomic mapping of fungal secondary metabolite clusters. , 2010, Fungal genetics and biology : FG & B.

[27]  Francisco Barona-Gómez,et al.  Increasing Metagenomic Resolution of Microbiome Interactions Through Functional Phylogenomics and Bacterial Sub-Communities , 2016, Front. Genet..

[28]  Neil L Kelleher,et al.  A Roadmap for Natural Product Discovery Based on Large-Scale Genomics and Metabolomics , 2014, Nature chemical biology.

[29]  Chad W. Johnston,et al.  Polyketide and nonribosomal peptide retro-biosynthesis and global gene cluster matching. , 2016, Nature chemical biology.

[30]  Pablo Cruz-Morales,et al.  Actinobacteria phylogenomics, selective isolation from an iron oligotrophic environment and siderophore functional characterization, unveil new desferrioxamine traits , 2017, FEMS microbiology ecology.

[31]  J. Badger,et al.  The Natural Product Domain Seeker NaPDoS: A Phylogeny Based Bioinformatic Tool to Classify Secondary Metabolite Gene Diversity , 2012, PloS one.

[32]  Kai Blin,et al.  antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences , 2011, Nucleic Acids Res..

[33]  S. Brady,et al.  eSNaPD: a versatile, web-based bioinformatics platform for surveying and mining natural product biosynthetic diversity from metagenomes. , 2014, Chemistry & biology.

[34]  C. Currie,et al.  Gene fragmentation in bacterial draft genomes: extent, consequences and mitigation , 2012, BMC Genomics.

[35]  Mikael R. Andersen,et al.  Accurate prediction of secondary metabolite gene clusters in filamentous fungi , 2012, Proceedings of the National Academy of Sciences.

[36]  Kai Blin,et al.  plantiSMASH: automated identification, annotation and expression analysis of plant biosynthetic gene clusters , 2016, bioRxiv.

[37]  Eric P. Nawrocki,et al.  NCBI prokaryotic genome annotation pipeline , 2016, Nucleic acids research.

[38]  Miriam L. Land,et al.  Trace: Tennessee Research and Creative Exchange Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification Recommended Citation Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification , 2022 .

[39]  Steven Salzberg,et al.  TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders , 2004, Bioinform..

[40]  Tilmann Weber,et al.  The evolution of genome mining in microbes - a review. , 2016, Natural product reports.

[41]  Yaojun Tong,et al.  CRISPR-Cas9 Based Engineering of Actinomycetal Genomes. , 2015, ACS synthetic biology.

[42]  Jakob Weber,et al.  Functional Reconstitution of a Fungal Natural Product Gene Cluster by Advanced Genome Editing. , 2017, ACS synthetic biology.

[43]  Marnix H Medema,et al.  Bioinformatics approaches and software for detection of secondary metabolic gene clusters. , 2012, Methods in molecular biology.

[44]  Tilmann Weber,et al.  Metabolic engineering with systems biology tools to optimize production of prokaryotic secondary metabolites. , 2016, Natural product reports.

[45]  Huimin Zhao,et al.  CRISPR-Cas9 strategy for activation of silent Streptomyces biosynthetic gene clusters , 2017, Nature chemical biology.

[46]  Jeroen S. Dickschat,et al.  Bacterial terpene cyclases. , 2016, Natural product reports.

[47]  Jean-Luc Pernodet,et al.  The Genome Sequence of Streptomyces lividans 66 Reveals a Novel tRNA-Dependent Peptide Biosynthetic System within a Metal-Related Genomic Island , 2013, Genome biology and evolution.

[48]  Kazimierz Wrobel,et al.  Phylogenomics of 2,4-Diacetylphloroglucinol-Producing Pseudomonas and Novel Antiglycation Endophytes from Piper auritum. , 2017, Journal of natural products.

[49]  Gitanjali Yadav,et al.  SEARCHPKS: a program for detection and analysis of polyketide synthase domains , 2003, Nucleic Acids Res..

[50]  Kai Blin,et al.  antiSMASH 2.0—a versatile platform for genome mining of secondary metabolite producers , 2013, Nucleic Acids Res..

[51]  G. Challis,et al.  Discovery of microbial natural products by activation of silent biosynthetic gene clusters , 2015, Nature Reviews Microbiology.

[52]  Tilmann Weber,et al.  In silico tools for the analysis of antibiotic biosynthetic pathways. , 2014, International journal of medical microbiology : IJMM.

[53]  W. A. van der Donk,et al.  Expanded Natural Product Diversity Revealed by Analysis of Lanthipeptide-Like Gene Clusters in Actinobacteria , 2015, Applied and Environmental Microbiology.

[54]  U. Mortensen,et al.  A CRISPR-Cas9 System for Genetic Engineering of Filamentous Fungi , 2015, PloS one.

[55]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[56]  Y. Dufresne,et al.  Norine: A powerful resource for novel nonribosomal peptide discovery , 2015, Synthetic and systems biotechnology.

[57]  Kai Blin,et al.  The Antibiotic Resistant Target Seeker (ARTS), an exploration engine for antibiotic cluster prioritization and novel drug target discovery , 2017, Nucleic Acids Res..

[58]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[59]  Oscar P. Kuipers,et al.  BAGEL3: automated identification of genes encoding bacteriocins and (non-)bactericidal posttranslationally modified peptides , 2013, Nucleic Acids Res..

[60]  Peter Cimermancic,et al.  A Systematic Analysis of Biosynthetic Gene Clusters in the Human Microbiome Reveals a Common Family of Antibiotics , 2014, Cell.

[61]  Tilmann Weber,et al.  The secondary metabolite bioinformatics portal: Computational tools to facilitate synthetic biology of secondary metabolite production , 2016, Synthetic and systems biotechnology.

[62]  Pieter C. Dorrestein,et al.  An Integrated Metabolomic and Genomic Mining Workflow To Uncover the Biosynthetic Potential of Bacteria , 2016, mSystems.

[63]  Gregory Kucherov,et al.  NORINE: a database of nonribosomal peptides , 2007, Nucleic Acids Res..

[64]  Neetika Nath,et al.  CASSIS and SMIPS: promoter-based prediction of secondary metabolite gene clusters in eukaryotic genomes , 2015, Bioinform..

[65]  Peter Man-Un Ung,et al.  Automated genome mining for natural products , 2009, BMC Bioinformatics.

[66]  Debasisa Mohanty,et al.  SBSPKSv 2 : structure-based sequence analysis of polyketide synthases and nonribosomal peptide synthetases , 2017 .

[67]  Julian Brandl,et al.  FunGeneClusterS: Predicting fungal gene clusters from genome and transcriptome data , 2016, Synthetic and systems biotechnology.

[68]  Kai Blin,et al.  Improved Lanthipeptide Detection and Prediction for antiSMASH , 2014, PloS one.

[69]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[70]  Kai Blin,et al.  NRPSpredictor2—a web server for predicting NRPS adenylation domain specificity , 2011, Nucleic Acids Res..

[71]  I. Hoof,et al.  CLUSEAN: a computer-based framework for the automated analysis of bacterial secondary metabolite biosynthetic gene clusters. , 2009, Journal of biotechnology.

[72]  Kai Blin,et al.  CRISPy-web: An online resource to design sgRNAs for CRISPR applications , 2016, Synthetic and systems biotechnology.

[73]  Michael A. Skinnider,et al.  Genomes to natural products PRediction Informatics for Secondary Metabolomes (PRISM) , 2015, Nucleic acids research.

[74]  Brian O. Bachmann,et al.  A genomics-guided approach for discovering and expressing cryptic metabolic pathways , 2003, Nature Biotechnology.

[75]  Valérie Leclère,et al.  Smiles2Monomers: a link between chemical and biological structures for polymers , 2015, Journal of Cheminformatics.

[76]  Tatiana A. Tatusova,et al.  RefSeq microbial genomes database: new representation and annotation strategy , 2013, Nucleic Acids Res..

[77]  Anne Osbourn,et al.  Plant metabolic clusters - from genetics to genomics. , 2016, The New phytologist.

[78]  Roger G. Linington,et al.  Insights into Secondary Metabolism from a Global Analysis of Prokaryotic Biosynthetic Gene Clusters , 2014, Cell.

[79]  Huimin Zhao,et al.  Breaking the silence: new strategies for discovering novel natural products. , 2017, Current opinion in biotechnology.