The secondary metabolite bioinformatics portal: Computational tools to facilitate synthetic biology of secondary metabolite production

Natural products are among the most important sources of lead molecules for drug discovery. With the development of affordable whole-genome sequencing technologies and other ‘omics tools, the field of natural products research is currently undergoing a shift in paradigms. While, for decades, mainly analytical and chemical methods gave access to this group of compounds, nowadays genomics-based methods offer complementary approaches to find, identify and characterize such molecules. This paradigm shift also resulted in a high demand for computational tools to assist researchers in their daily work. In this context, this review gives a summary of tools and databases that currently are available to mine, identify and characterize natural product biosynthesis pathways and their producers based on ‘omics data. A web portal called Secondary Metabolite Bioinformatics Portal (SMBP at http://www.secondarymetabolites.org) is introduced to provide a one-stop catalog and links to these bioinformatics resources. In addition, an outlook is presented how the existing tools and those to be developed will influence synthetic biology approaches in the natural products field.

[1]  David J Newman,et al.  Natural products as sources of new drugs over the 30 years from 1981 to 2010. , 2012, Journal of natural products.

[2]  Chaitan Khosla,et al.  Molecular recognition between ketosynthase and acyl carrier protein domains of the 6-deoxyerythronolide B synthase , 2010, Proceedings of the National Academy of Sciences.

[3]  Axel Zeeck,et al.  Big Effects from Small Changes: Possible Ways to Explore Nature's Chemical Diversity , 2002, Chembiochem : a European journal of chemical biology.

[4]  Xiaoqiang Jia,et al.  Genome-scale metabolic network guided engineering of Streptomyces tsukubaensis for FK506 production improvement , 2013, Microbial Cell Factories.

[5]  Feng-Chi Chen,et al.  GEMSiRV: a software platform for GEnome-scale metabolic model simulation, reconstruction and visualization , 2012, Bioinform..

[6]  Juho Rousu,et al.  Comparative Genome-Scale Reconstruction of Gapless Metabolic Networks for Present and Ancestral Species , 2014, PLoS Comput. Biol..

[7]  S. Brady,et al.  Global biogeographic sampling of bacterial secondary metabolism , 2015, eLife.

[8]  Rainer Breitling,et al.  Pep2Path: Automated Mass Spectrometry-Guided Genome Mining of Peptidic Natural Products , 2014, PLoS Comput. Biol..

[9]  Roger G. Linington,et al.  Insights into Secondary Metabolism from a Global Analysis of Prokaryotic Biosynthetic Gene Clusters , 2014, Cell.

[10]  G. Challis,et al.  Predictive, structure-based model of amino acid recognition by nonribosomal peptide synthetase adenylation domains. , 2000, Chemistry & biology.

[11]  Gitanjali Yadav,et al.  SBSPKS: structure based sequence analysis of polyketide synthases , 2010, Nucleic Acids Res..

[12]  Christoph Steinbeck,et al.  ChEBI in 2016: Improved services and an expanding collection of metabolites , 2015, Nucleic Acids Res..

[13]  Georgios Skiniotis,et al.  Structural rearrangements of a polyketide synthase module during its catalytic cycle , 2014, Nature.

[14]  Joost Boele,et al.  FAME, the Flux Analysis and Modeling Environment , 2012, BMC Systems Biology.

[15]  Rainer Breitling,et al.  Comparative genome‐scale metabolic modeling of actinomycetes: The topology of essential core metabolism , 2011, FEBS letters.

[16]  Yinhua Lu,et al.  One-step high-efficiency CRISPR/Cas9-mediated genome editing in Streptomyces. , 2015, Acta biochimica et biophysica Sinica.

[17]  Kiyoshi Asai,et al.  MIDDAS-M: Motif-Independent De Novo Detection of Secondary Metabolite Gene Clusters through the Integration of Genome Sequencing and Transcriptome Data , 2013, PloS one.

[18]  Mikael R. Andersen,et al.  Accurate prediction of secondary metabolite gene clusters in filamentous fungi , 2012, Proceedings of the National Academy of Sciences.

[19]  T. Stachelhaus,et al.  The specificity-conferring code of adenylation domains in nonribosomal peptide synthetases. , 1999, Chemistry & biology.

[20]  Kyle R. Conway,et al.  ClusterMine360: a database of microbial PKS/NRPS biosynthesis , 2012, Nucleic Acids Res..

[21]  Michael A. Skinnider,et al.  Informatic search strategies to discover analogues and variants of natural product archetypes , 2016, Journal of Industrial Microbiology & Biotechnology.

[22]  Kai Blin,et al.  antiSMASH 2.0—a versatile platform for genome mining of secondary metabolite producers , 2013, Nucleic Acids Res..

[23]  Yoshiyuki Sakaki,et al.  Genome sequence of an industrial microorganism Streptomyces avermitilis: Deducing the ability of producing secondary metabolites , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Paula Y. Calle,et al.  Mapping gene clusters within arrayed metagenomic libraries to expand the structural diversity of biomedically relevant natural products , 2013, Proceedings of the National Academy of Sciences.

[25]  Lars-Oliver Essen,et al.  Crystal Structure of the Termination Module of a Nonribosomal Peptide Synthetase , 2008, Science.

[26]  Yaojun Tong,et al.  CRISPR-Cas9 Based Engineering of Actinomycetal Genomes. , 2015, ACS synthetic biology.

[27]  Roland J. Siezen,et al.  Classification of the Adenylation and Acyl-Transferase Activity of NRPS and PKS Systems Using Ensembles of Substrate Specific Hidden Markov Models , 2013, PloS one.

[28]  Michael A. Skinnider,et al.  Genomes to natural products PRediction Informatics for Secondary Metabolomes (PRISM) , 2015, Nucleic acids research.

[29]  Yanli Wang,et al.  PubChem: Integrated Platform of Small Molecules and Biological Activities , 2008 .

[30]  J. Bellasio,et al.  Estimating the economic costs of antimicrobial resistance , 2014 .

[31]  D. Newman,et al.  Natural products as sources of new drugs over the last 25 years. , 2007, Journal of natural products.

[32]  Kai Blin,et al.  NRPSpredictor2—a web server for predicting NRPS adenylation domain specificity , 2011, Nucleic Acids Res..

[33]  Jacques Ravel,et al.  Chapter 8. Methods for in silico prediction of microbial polyketide and nonribosomal peptide biosynthetic pathways from DNA sequence data. , 2009, Methods in enzymology.

[34]  Michael Darsow,et al.  ChEBI: a database and ontology for chemical entities of biological interest , 2007, Nucleic Acids Res..

[35]  S. Lee,et al.  Metabolic engineering of antibiotic factories: new tools for antibiotic production in actinomycetes. , 2015, Trends in biotechnology.

[36]  F. Barona-Gómez,et al.  Recapitulation of the evolution of biosynthetic gene clusters reveals hidden chemical diversity on bacterial genomes , 2015 .

[37]  Georgios Skiniotis,et al.  Structure of a modular polyketide synthase , 2014, Nature.

[38]  Yoshiyuki Sakaki,et al.  Complete genome sequence and comparative analysis of the industrial microorganism Streptomyces avermitilis , 2003, Nature Biotechnology.

[39]  Stefan Günther,et al.  StreptomeDB 2.0—an extended resource of natural products produced by streptomycetes , 2015, Nucleic Acids Res..

[40]  Dylan Alexander,et al.  Combinatorial biosynthesis of novel antibiotics related to daptomycin , 2006, Proceedings of the National Academy of Sciences.

[41]  John R Carney,et al.  Combinatorial polyketide biosynthesis by de novo design and rearrangement of modular polyketide synthase genes , 2005, Nature Biotechnology.

[42]  Chad W. Johnston,et al.  Dereplicating nonribosomal peptides using an informatic search algorithm for natural products (iSNAP) discovery , 2012, Proceedings of the National Academy of Sciences.

[43]  Editorial: ChemSpider--a tool for Natural Products research. , 2015, Natural product reports.

[44]  Peter D. Karp,et al.  Construction and completion of flux balance models from pathway databases , 2012, Bioinform..

[45]  Hosein Mohimani,et al.  Cycloquest: identification of cyclopeptides via database search of their mass spectra against genome databases. , 2011, Journal of proteome research.

[46]  Satoshi Yuzawa,et al.  Reprogramming a module of the 6-deoxyerythronolide B synthase for iterative chain elongation , 2012, Proceedings of the National Academy of Sciences.

[47]  Jennifer L Reed,et al.  Software platforms to facilitate reconstructing genome-scale metabolic networks. , 2014, Environmental microbiology.

[48]  Kai Blin,et al.  Improved Lanthipeptide Detection and Prediction for antiSMASH , 2014, PloS one.

[49]  Alcino J. Silva,et al.  Multiple genetic modifications of the erythromycin polyketide synthase to produce a library of novel ‘ ‘ unnatural ’ ’ natural products , 1999 .

[50]  J. V. Van Impe,et al.  Metabolic impact assessment for heterologous protein production in Streptomyces lividans based on genome-scale metabolic network modeling. , 2013, Mathematical biosciences.

[51]  Rainer Breitling,et al.  Design-based re-engineering of biosynthetic gene clusters: plug-and-play in practice. , 2013, Current opinion in biotechnology.

[52]  J. Badger,et al.  The Natural Product Domain Seeker NaPDoS: A Phylogeny Based Bioinformatic Tool to Classify Secondary Metabolite Gene Diversity , 2012, PloS one.

[53]  V. Arcus,et al.  Prediction of the substrate for nonribosomal peptide synthetase (NRPS) adenylation domains by virtual screening , 2015, Proteins.

[54]  Michael A. Skinnider,et al.  An automated Genomes-to-Natural Products platform (GNP) for the discovery of modular natural products , 2015, Nature Communications.

[55]  R. Breitling,et al.  Detecting Sequence Homology at the Gene Cluster Level with MultiGeneBlast , 2013, Molecular biology and evolution.

[56]  H. Koike,et al.  Motif-independent de novo detection of secondary metabolite gene clusters—toward identification from filamentous fungi , 2015, Front. Microbiol..

[57]  J. Staunton,et al.  Active-site residue, domain and module swaps in modular polyketide synthases , 2003, Journal of Industrial Microbiology and Biotechnology.

[58]  Jurica Zucko,et al.  Predicting substrate specificity of adenylation domains of nonribosomal peptide synthetases and other protein properties by latent semantic indexing , 2013, Journal of Industrial Microbiology & Biotechnology.

[59]  Christoph Steinbeck,et al.  The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013 , 2012, Nucleic Acids Res..

[60]  Rick L. Stevens,et al.  High-throughput generation, optimization and analysis of genome-scale metabolic models , 2010, Nature Biotechnology.

[61]  Huimin Zhao,et al.  High-Efficiency Multiplex Genome Editing of Streptomyces Species Using an Engineered CRISPR/Cas System , 2014, ACS synthetic biology.

[62]  B. Palsson,et al.  A protocol for generating a high-quality genome-scale metabolic reconstruction , 2010 .

[63]  J. Zucko,et al.  ClustScan: an integrated program package for the semi-automatic annotation of modular biosynthetic gene clusters and in silico prediction of novel chemical structures , 2008, Nucleic acids research.

[64]  R. Hammami,et al.  BACTIBASE second release: a database and tool platform for bacteriocin characterization , 2010, BMC Microbiology.

[65]  Gitanjali Yadav,et al.  SEARCHPKS: a program for detection and analysis of polyketide synthase domains , 2003, Nucleic Acids Res..

[66]  Norman W. Paton,et al.  The SuBliMinaL Toolbox: automating steps in the reconstruction of metabolic networks , 2011, J. Integr. Bioinform..

[67]  Oscar P. Kuipers,et al.  BAGEL: a web-based bacteriocin genome mining tool , 2006, Nucleic Acids Res..

[68]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[69]  Min Woo Kim,et al.  Reconstruction of a high‐quality metabolic model enables the identification of gene overexpression targets for enhanced antibiotic production in Streptomyces coelicolor A3(2) , 2014, Biotechnology journal.

[70]  S. Brady,et al.  Natural Product Biosynthetic Gene Diversity in Geographically Distinct Soil Microbiomes , 2012, Applied and Environmental Microbiology.

[71]  D. Haft,et al.  SMURF: Genomic mapping of fungal secondary metabolite clusters. , 2010, Fungal genetics and biology : FG & B.

[72]  Christian Senger,et al.  StreptomeDB: a resource for natural compounds isolated from Streptomyces species , 2012, Nucleic Acids Res..

[73]  Sean F. Brady,et al.  Chemical-biogeographic survey of secondary metabolism in soil , 2014, Proceedings of the National Academy of Sciences.

[74]  J. Bellasio,et al.  Estimating the economic costs of antimicrobial resistance: Model and Results , 2014 .

[75]  Michael A Fischbach,et al.  Computational approaches to natural product discovery. , 2015, Nature chemical biology.

[76]  Nobuyuki Fujita,et al.  DoBISCUIT: a database of secondary metabolite biosynthetic gene clusters , 2012, Nucleic Acids Res..

[77]  David J Newman,et al.  Natural products: a continuing source of novel drug leads. , 2013, Biochimica et biophysica acta.

[78]  Valérie Leclère,et al.  Norine, the knowledgebase dedicated to non-ribosomal peptides, is now open to crowdsourcing , 2015, Nucleic Acids Res..

[79]  Mallika Vijayan,et al.  PKSIIIexplorer: TSVM approach for predicting Type III polyketide synthase proteins , 2011, Bioinformation.

[80]  Chad W. Johnston,et al.  Exploration of Nonribosomal Peptide Families with an Automated Informatic Search Algorithm. , 2015, Chemistry & biology.

[81]  Dan Søndergaard,et al.  Computational discovery of specificity-conferring sites in non-ribosomal peptide synthetases , 2016, Bioinform..

[82]  Brian O. Bachmann,et al.  Microbial genome mining for accelerated natural products discovery: is a renaissance in the making? , 2014, Journal of Industrial Microbiology & Biotechnology.

[83]  Victor M. Markowitz,et al.  IMG-ABC: A Knowledge Base To Fuel Discovery of Biosynthetic Gene Clusters and Novel Secondary Metabolites , 2015, mBio.

[84]  Kai Blin,et al.  antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences , 2011, Nucleic Acids Res..

[85]  Chad W. Johnston,et al.  Automated Identification of Depsipeptide Natural Products by an Informatic Search Algorithm , 2015, Chembiochem : a European journal of chemical biology.

[86]  G. V. van Wezel,et al.  Metabolomics in the natural products field--a gateway to novel antibiotics. , 2015, Drug discovery today. Technologies.

[87]  Zixin Deng,et al.  Highly efficient editing of the actinorhodin polyketide chain length factor gene in Streptomyces coelicolor M145 using CRISPR/Cas9-CodA(sm) combined system , 2015, Applied Microbiology and Biotechnology.

[88]  E. Ferreira,et al.  Reconstructing genome-scale metabolic models with merlin , 2015, Nucleic acids research.

[89]  Carlos Prieto,et al.  NRPSsp: non-ribosomal peptide synthase substrate predictor , 2012, Bioinform..

[90]  Tilmann Weber,et al.  Reprogramming acyl carrier protein interactions of an Acyl-CoA promiscuous trans-acyltransferase. , 2014, Chemistry & biology.

[91]  J. Zucko,et al.  Recombinatorial biosynthesis of polyketides , 2011, Journal of Industrial Microbiology & Biotechnology.

[92]  Riadh Hammami,et al.  BACTIBASE: a new web-accessible database for bacteriocin characterization , 2007, BMC Microbiology.

[93]  G. V. van Wezel,et al.  Natural product proteomining, a quantitative proteomics platform, allows rapid discovery of biosynthetic gene clusters for different classes of natural products. , 2014, Chemistry & biology.

[94]  Oscar P. Kuipers,et al.  BAGEL3: automated identification of genes encoding bacteriocins and (non-)bactericidal posttranslationally modified peptides , 2013, Nucleic Acids Res..

[95]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[96]  Neil L Kelleher,et al.  A Roadmap for Natural Product Discovery Based on Large-Scale Genomics and Metabolomics , 2014, Nature chemical biology.

[97]  Gitanjali Yadav,et al.  NRPS-PKS: a knowledge-based resource for analysis of NRPS/PKS megasynthases , 2004, Nucleic Acids Res..

[98]  Kristel Bernaerts,et al.  Genome-scale metabolic flux analysis of Streptomyces lividans growing on a complex medium. , 2012, Journal of biotechnology.

[99]  Kiyoshi Asai,et al.  Motif-Independent Prediction of a Secondary Metabolism Gene Cluster Using Comparative Genomics: Application to Sequenced Genomes of Aspergillus and Ten Other Filamentous Fungal Species , 2014, DNA research : an international journal for rapid publication of reports on genes and genomes.

[100]  Minoru Kanehisa,et al.  Comprehensive analysis of distinctive polyketide and nonribosomal peptide structural motifs encoded in microbial genomes. , 2007, Journal of molecular biology.

[101]  I. Hoof,et al.  CLUSEAN: a computer-based framework for the automated analysis of bacterial secondary metabolite biosynthetic gene clusters. , 2009, Journal of biotechnology.

[102]  Tilmann Weber,et al.  Specificity prediction of adenylation domains in nonribosomal peptide synthetases (NRPS) using transductive support vector machines (TSVMs) , 2005, Nucleic acids research.

[103]  Carla S. Jones,et al.  Minimum Information about a Biosynthetic Gene cluster. , 2015, Nature chemical biology.

[104]  Intawat Nookaew,et al.  The RAVEN Toolbox and Its Use for Generating a Genome-scale Metabolic Model for Penicillium chrysogenum , 2013, PLoS Comput. Biol..

[105]  Christopher T. Walsh,et al.  Antibiotics for Emerging Pathogens , 2009, Science.

[106]  Rainer Breitling,et al.  Computational tools for the synthetic design of biochemical pathways , 2012, Nature Reviews Microbiology.

[107]  Kazuki Saito,et al.  KNApSAcK family databases: integrated metabolite-plant species databases for multifaceted plant research. , 2012, Plant & cell physiology.

[108]  G. V. van Wezel,et al.  Metabolic profiling as a tool for prioritizing antimicrobial compounds , 2015, Journal of Industrial Microbiology & Biotechnology.

[109]  Gregory Kucherov,et al.  NORINE: a database of nonribosomal peptides , 2007, Nucleic Acids Res..

[110]  J. Zucko,et al.  Databases of the thiotemplate modular systems (CSDB) and their in silico recombinants (r-CSDB) , 2013, Journal of Industrial Microbiology & Biotechnology.

[111]  Rajesh S. Gokhale,et al.  SEARCHGTr: a program for analysis of glycosyltransferases involved in glycosylation of secondary metabolites , 2005, Nucleic Acids Res..

[112]  T. Weber,et al.  Module Extension of a Non‐Ribosomal Peptide Synthetase of the Glycopeptide Antibiotic Balhimycin Produced by Amycolatopsis balhimycina , 2008, Chembiochem : a European journal of chemical biology.

[113]  B. Barrell,et al.  Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2) , 2002, Nature.

[114]  Pavel A. Pevzner,et al.  NRPquest: Coupling Mass Spectrometry and Genome Mining for Nonribosomal Peptide Discovery , 2014, Journal of natural products.

[115]  M. Marahiel A structural model for multimodular NRPS assembly lines. , 2016, Natural product reports.

[116]  Kai Blin,et al.  antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters , 2015, Nucleic Acids Res..

[117]  George Papadatos,et al.  The ChEMBL bioactivity database: an update , 2013, Nucleic Acids Res..

[118]  Peter Man-Un Ung,et al.  Automated genome mining for natural products , 2009, BMC Bioinformatics.

[119]  George Papadatos,et al.  ChEMBL web services: streamlining access to drug discovery data and utilities , 2015, Nucleic Acids Res..

[120]  S. Lee,et al.  Metabolic flux analysis and metabolic engineering of microorganisms. , 2008, Molecular bioSystems.

[121]  Oscar P. Kuipers,et al.  BAGEL2: mining for bacteriocins in genomic data , 2010, Nucleic Acids Res..

[122]  Tadao Sugiura,et al.  KNApSAcK Metabolite Activity Database for retrieving the relationships between metabolites and biological activities. , 2014, Plant & cell physiology.

[123]  A. So,et al.  Tackling antibiotic resistance , 2010, BMJ : British Medical Journal.

[124]  Christopher N. Boddy,et al.  Bioinformatics tools for genome mining of polyketide and non-ribosomal peptides , 2014, Journal of Industrial Microbiology & Biotechnology.

[125]  Jens Nielsen,et al.  MEMOSys: Bioinformatics platform for genome-scale metabolic models , 2011, BMC Systems Biology.

[126]  Yixin Chen,et al.  MicrobesFlux: a web platform for drafting metabolic models from the KEGG database , 2012, BMC Systems Biology.

[127]  Nuno Bandeira,et al.  Automated Genome Mining of Ribosomal Peptide Natural Products , 2014, ACS chemical biology.

[128]  Tilmann Weber,et al.  In silico tools for the analysis of antibiotic biosynthetic pathways. , 2014, International journal of medical microbiology : IJMM.