Genomes to natural products PRediction Informatics for Secondary Metabolomes (PRISM)

Microbial natural products are an invaluable source of evolved bioactive small molecules and pharmaceutical agents. Next-generation and metagenomic sequencing indicates untapped genomic potential, yet high rediscovery rates of known metabolites increasingly frustrate conventional natural product screening programs. New methods to connect biosynthetic gene clusters to novel chemical scaffolds are therefore critical to enable the targeted discovery of genetically encoded natural products. Here, we present PRISM, a computational resource for the identification of biosynthetic gene clusters, prediction of genetically encoded nonribosomal peptides and type I and II polyketides, and bio- and cheminformatic dereplication of known natural products. PRISM implements novel algorithms which render it uniquely capable of predicting type II polyketides, deoxygenated sugars, and starter units, making it a comprehensive genome-guided chemical structure prediction engine. A library of 57 tailoring reactions is leveraged for combinatorial scaffold library generation when multiple potential substrates are consistent with biosynthetic logic. We compare the accuracy of PRISM to existing genomic analysis platforms. PRISM is an open-source, user-friendly web application available at http://magarveylab.ca/prism/.

[1]  Roland J. Siezen,et al.  Classification of the Adenylation and Acyl-Transferase Activity of NRPS and PKS Systems Using Ensembles of Substrate Specific Hidden Markov Models , 2013, PloS one.

[2]  C. Townsend,et al.  Starter unit specificity directs genome mining of polyketide synthase pathways in fungi. , 2008, Bioorganic chemistry.

[3]  Peter Man-Un Ung,et al.  Automated genome mining for natural products , 2009, BMC Bioinformatics.

[4]  D. Homerova,et al.  Cloning and characterization of a polyketide synthase gene cluster involved in biosynthesis of a proposed angucycline-like polyketide auricin in Streptomyces aureofaciens CCM 3239. , 2002, Gene.

[5]  P. Leadlay,et al.  Biosynthetic gene cluster of the glycopeptide antibiotic teicoplanin: characterization of two glycosyltransferases and the key acyltransferase. , 2004, Chemistry & biology.

[6]  Huan Wang,et al.  Structural investigation of ribosomally synthesized natural products by hypothetical structure enumeration and evaluation using tandem MS , 2014, Proceedings of the National Academy of Sciences.

[7]  W. Gerwick,et al.  The barbamide biosynthetic gene cluster: a novel marine cyanobacterial system of mixed polyketide synthase (PKS)-non-ribosomal peptide synthetase (NRPS) origin involving an unusual trichloroleucyl starter unit. , 2002, Gene.

[8]  Christopher T. Walsh,et al.  Lessons from natural molecules , 2004, Nature.

[9]  G. Schneider,et al.  Identification of Late‐Stage Glycosylation Steps in the Biosynthetic Pathway of the Anthracycline Nogalamycin , 2012, Chembiochem : a European journal of chemical biology.

[10]  Gitanjali Yadav,et al.  NRPS-PKS: a knowledge-based resource for analysis of NRPS/PKS megasynthases , 2004, Nucleic Acids Res..

[11]  K. Ishida,et al.  Geminal tandem C-methylation in the discoid resistomycin pathway. , 2007, Journal of the American Chemical Society.

[12]  Michele Magrane,et al.  UniProt Knowledgebase: a hub of integrated protein data , 2011, Database J. Biol. Databases Curation.

[13]  C. Walsh,et al.  Nature's inventory of halogenation catalysts: oxidative strategies predominate. , 2006, Chemical reviews.

[14]  Minoru Kanehisa,et al.  Comprehensive analysis of distinctive polyketide and nonribosomal peptide structural motifs encoded in microbial genomes. , 2007, Journal of molecular biology.

[15]  H. Jenke-Kodama,et al.  Exploiting the mosaic structure of trans-acyltransferase polyketide synthases for natural product discovery and pathway dissection , 2008, Nature Biotechnology.

[16]  Rajesh S. Gokhale,et al.  In silico analysis of methyltransferase domains involved in biosynthesis of secondary metabolites , 2008, BMC Bioinformatics.

[17]  M. Marahiel,et al.  The tyrocidine biosynthesis operon of Bacillus brevis: complete nucleotide sequence and biochemical characterization of functional internal adenylation domains , 1997, Journal of bacteriology.

[18]  P. Williams,et al.  Marine actinomycete diversity and natural product discovery , 2004, Antonie van Leeuwenhoek.

[19]  S. Rogelj,et al.  Expanding our Understanding of Sequence-Function Relationships of Type II Polyketide Biosynthetic Gene Clusters: Bioinformatics-Guided Identification of Frankiamicin A from Frankia sp. EAN1pec , 2015, PloS one.

[20]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[21]  Molly K. Gibson,et al.  Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology , 2014, The ISME Journal.

[22]  S. Donadio,et al.  Cloning of genes governing the deoxysugar portion of the erythromycin biosynthesis pathway in Saccharopolyspora erythraea (Streptomyces erythreus) , 1989, Journal of bacteriology.

[23]  K. Ishida,et al.  Artificial Reconstruction of Two Cryptic Angucycline Antibiotic Biosynthetic Pathways , 2007, Chembiochem : a European journal of chemical biology.

[24]  Michael A. Skinnider,et al.  An automated Genomes-to-Natural Products platform (GNP) for the discovery of modular natural products , 2015, Nature Communications.

[25]  Peer Bork,et al.  SMART 7: recent updates to the protein domain annotation resource , 2011, Nucleic Acids Res..

[26]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[27]  Jurica Zucko,et al.  Predicting substrate specificity of adenylation domains of nonribosomal peptide synthetases and other protein properties by latent semantic indexing , 2013, Journal of Industrial Microbiology & Biotechnology.

[28]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[29]  Carlos Prieto,et al.  NRPSsp: non-ribosomal peptide synthase substrate predictor , 2012, Bioinform..

[30]  R. Süssmuth,et al.  The biosynthesis of teicoplanin-type glycopeptide antibiotics: assignment of p450 mono-oxygenases to side chain cyclizations of glycopeptide a47934. , 2007, Chemistry & biology.

[31]  Haruo Ikeda,et al.  Genomic basis for natural product biosynthetic diversity in the actinomycetes. , 2009, Natural product reports.

[32]  Andreas Prlic,et al.  BioJava: an open-source framework for bioinformatics in 2012 , 2012, Bioinform..

[33]  David J Newman,et al.  Natural products as sources of new drugs over the 30 years from 1981 to 2010. , 2012, Journal of natural products.

[34]  W. Metcalf,et al.  Comparative genomics of actinomycetes with a focus on natural product biosynthetic genes , 2013, BMC Genomics.

[35]  A. Bechthold,et al.  Cloning and Sequencing of the Biosynthetic Gene Cluster for Saquayamycin Z and Galtamycin B and the Elucidation of the Assembly of Their Saccharide Chains , 2009, Chembiochem : a European journal of chemical biology.

[36]  B. Shen,et al.  The tallysomycin biosynthetic gene cluster from Streptoalloteichus hindustanus E465-94 ATCC 31158 unveiling new insights into the biosynthesis of the bleomycin family of antitumor antibiotics. , 2007, Molecular bioSystems.

[37]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[38]  Clay C C Wang,et al.  Total biosynthesis of antitumor nonribosomal peptides in Escherichia coli , 2006, Nature chemical biology.

[39]  P. Mäntsälä,et al.  Molecular Evolution of Aromatic Polyketides and Comparative Sequence Analysis of Polyketide Ketosynthase and 16S Ribosomal DNA Genes from Various Streptomyces Species , 2002, Applied and Environmental Microbiology.

[40]  Andriy Luzhetskyy,et al.  Type II polyketide synthases: gaining a deeper insight into enzymatic teamwork. , 2007, Natural product reports.

[41]  Peter Kolb,et al.  Structure-based discovery of β2-adrenergic receptor ligands , 2009, Proceedings of the National Academy of Sciences.

[42]  N. Kelleher,et al.  Proteomics-based discovery of koranimine, a cyclic imine natural product. , 2011, Journal of the American Chemical Society.

[43]  M. Ozawa,et al.  Cloning, sequencing and heterologous expression of the medermycin biosynthetic gene cluster of Streptomyces sp. AM-7161: towards comparative analysis of the benzoisochromanequinone gene clusters. , 2003, Microbiology.

[44]  Gwan-Su Yi,et al.  PKMiner: a database for exploring type II polyketide synthases , 2012, BMC Microbiology.

[45]  Toni Gabaldón,et al.  trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses , 2009, Bioinform..

[46]  Kai Blin,et al.  antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters , 2015, Nucleic Acids Res..

[47]  Ramón Doallo,et al.  ProtTest 3: fast selection of best-fit models of protein evolution , 2011, Bioinform..

[48]  Jörn Piel,et al.  Biosynthesis of polyketides by trans-AT polyketide synthases. , 2016, Natural product reports.

[49]  J. Piel,et al.  Biosynthesis of pentangular polyphenols: deductions from the benastatin and griseorhodin pathways. , 2007, Journal of the American Chemical Society.

[50]  G. Challis,et al.  Predictive, structure-based model of amino acid recognition by nonribosomal peptide synthetase adenylation domains. , 2000, Chemistry & biology.

[51]  Andreas Prlic,et al.  Sequence analysis , 2003 .

[52]  Kai Blin,et al.  antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences , 2011, Nucleic Acids Res..

[53]  B. Shen,et al.  The biosynthetic gene cluster for the antitumor drug bleomycin from Streptomyces verticillus ATCC15003 supporting functional interactions between nonribosomal peptide synthetases and a polyketide synthase. , 2000, Chemistry & biology.

[54]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[55]  M. Marahiel,et al.  Crystal structure of DhbE, an archetype for aryl acid activating domains of modular nonribosomal peptide synthetases , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[56]  Z. Deng,et al.  Unveiling the post-PKS redox tailoring steps in biosynthesis of the type II polyketide antitumor antibiotic xantholipin. , 2012, Chemistry & biology.

[57]  Bradley S Moore,et al.  Glycogenomics as a mass spectrometry-guided genome-mining method for microbial glycosylated molecules , 2013, Proceedings of the National Academy of Sciences.

[58]  Rainer Breitling,et al.  Pep2Path: Automated Mass Spectrometry-Guided Genome Mining of Peptidic Natural Products , 2014, PLoS Comput. Biol..

[59]  R. Borriss,et al.  Biosynthesis of the antibiotic bacillaene, the product of a giant polyketide synthase complex of the trans-AT family. , 2007, Angewandte Chemie.

[60]  J. Thorson,et al.  Deciphering indolocarbazole and enediyne aminodideoxypentose biosynthesis through comparative genomics: insights from the AT2433 biosynthetic locus. , 2006, Chemistry & biology.

[61]  D C Spellmeyer,et al.  Measuring diversity: experimental design of combinatorial libraries for drug discovery. , 1995, Journal of medicinal chemistry.

[62]  M. Achtman,et al.  Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[63]  C. Boddy,et al.  An Evolutionary Model Encompassing Substrate Specificity and Reactivity of Type I Polyketide Synthase Thioesterases , 2014, Chembiochem : a European journal of chemical biology.

[64]  Peter Willett,et al.  Similarity-based virtual screening using 2D fingerprints. , 2006, Drug discovery today.

[65]  G. Gottschalk,et al.  Structural and Functional Characterization of Three Polyketide Synthase Gene Clusters in Bacillus amyloliquefaciens FZB 42 , 2006, Journal of bacteriology.

[66]  F. de la Calle,et al.  Deciphering the Biosynthesis Pathway of the Antitumor Thiocoraline from a Marine Actinomycete and Its Expression in Two Streptomyces Species , 2006, Chembiochem : a European journal of chemical biology.

[67]  Yi-Qiang Cheng,et al.  Identification and characterization of the spiruchostatin biosynthetic gene cluster enable yield improvement by overexpressing a transcriptional activator , 2014, Journal of Industrial Microbiology & Biotechnology.

[68]  Yi-Qiang Cheng,et al.  Characterization of a Gene Cluster Responsible for the Biosynthesis of Anticancer Agent FK228 in Chromobacterium violaceum No. 968 , 2007, Applied and Environmental Microbiology.

[69]  Yanran Li,et al.  Classification, prediction, and verification of the regioselectivity of fungal polyketide synthase product template domains. , 2010, The Journal of biological chemistry.

[70]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[71]  Jérôme Hert,et al.  Quantifying Biogenic Bias in Screening Libraries , 2009, Nature chemical biology.

[72]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[73]  David M. Gooden,et al.  Studies on the biosynthesis of the lipodepsipeptide antibiotic Ramoplanin A2. , 2012, Bioorganic & medicinal chemistry.

[74]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[75]  R. Andersen,et al.  N-carbamoylation of 2,4-diaminobutyrate reroutes the outcome in padanamide biosynthesis. , 2013, Chemistry & biology.

[76]  Michael J. Keiser,et al.  Predicting new molecular targets for known drugs , 2009, Nature.

[77]  Jürgen Pleiss,et al.  The Cytochrome P450 Engineering Database: a navigation and prediction tool for the cytochrome P450 protein family , 2007, Bioinform..

[78]  Jacques Ravel,et al.  Chapter 8. Methods for in silico prediction of microbial polyketide and nonribosomal peptide biosynthetic pathways from DNA sequence data. , 2009, Methods in enzymology.

[79]  A. Miele,et al.  The structure of ActVA‐Orf6, a novel type of monooxygenase involved in actinorhodin biosynthesis , 2003, The EMBO journal.

[80]  T. Stachelhaus,et al.  The specificity-conferring code of adenylation domains in nonribosomal peptide synthetases. , 1999, Chemistry & biology.

[81]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[82]  Ramón Doallo,et al.  ProtTest-HPC: Fast Selection of Best-Fit Models of Protein Evolution , 2010, Euro-Par Workshops.

[83]  K. Shin‐ya,et al.  A stand-alone adenylation domain forms amide bonds in streptothricin biosynthesis. , 2012, Nature chemical biology.

[84]  Bradley S Moore,et al.  Biosynthesis and attachment of novel bacterial polyketide synthase starter units. , 2002, Natural product reports.

[85]  Robert D. Finn,et al.  HMMER web server: interactive sequence similarity searching , 2011, Nucleic Acids Res..

[86]  C. Hertweck,et al.  Genomics-inspired discovery of natural products. , 2011, Current opinion in chemical biology.

[87]  Christopher T Walsh,et al.  Polyketide and Nonribosomal Peptide Antibiotics: Modularity and Versatility , 2004, Science.

[88]  G. Grandi,et al.  Characterization of the Syringomycin Synthetase Gene Cluster , 1998, The Journal of Biological Chemistry.

[89]  R. Reid,et al.  The biosynthetic genes for disorazoles, potent cytotoxic compounds that disrupt microtubule formation. , 2005, Gene.

[90]  P. Willett,et al.  Similarity-based virtual screening using 2D fingerprints. , 2006, Drug discovery today.

[91]  Tilmann Weber,et al.  Phylogenetic analysis of condensation domains in NRPS sheds light on their functional evolution , 2007, BMC Evolutionary Biology.

[92]  Roger G. Linington,et al.  Insights into Secondary Metabolism from a Global Analysis of Prokaryotic Biosynthetic Gene Clusters , 2014, Cell.

[93]  C. Walsh,et al.  Characterization of SyrC, an aminoacyltransferase shuttling threonyl and chlorothreonyl residues in the syringomycin biosynthetic assembly line. , 2007, Chemistry & biology.

[94]  Melanie C. Burger,et al.  ChemDoodle Web Components: HTML5 toolkit for chemical graphics, interfaces, and informatics , 2015, Journal of Cheminformatics.

[95]  S. Brady,et al.  Functional analysis of environmental DNA-derived type II polyketide synthases reveals structurally diverse secondary metabolites , 2011, Proceedings of the National Academy of Sciences.

[96]  Kyle R. Conway,et al.  ClusterMine360: a database of microbial PKS/NRPS biosynthesis , 2012, Nucleic Acids Res..

[97]  Michael A Fischbach,et al.  Natural products version 2.0: connecting genes to molecules. , 2010, Journal of the American Chemical Society.

[98]  H. Jenke-Kodama,et al.  A Type II Polyketide Synthase is Responsible for Anthraquinone Biosynthesis in Photorhabdus luminescens , 2007, Chembiochem : a European journal of chemical biology.

[99]  Markus Fischer,et al.  Sequence analysis The Cytochrome P 450 Engineering Database : a navigation and prediction tool for the cytochrome P 450 protein family , 2007 .

[100]  S. Eddy,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[101]  R. Müller,et al.  Insights into an Unusual Nonribosomal Peptide Synthetase Biosynthesis , 2010, The Journal of Biological Chemistry.

[102]  F. Koehn,et al.  The evolving role of natural products in drug discovery , 2005, Nature Reviews Drug Discovery.

[103]  C. Walsh,et al.  Purification, priming, and catalytic acylation of carrier protein domains in the polyketide synthase and nonribosomal peptidyl synthetase modules of the HMWP1 subunit of yersiniabactin synthetase. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[104]  C. Walsh,et al.  Characterization of CmaA, an Adenylation-Thiolation Didomain Enzyme Involved in the Biosynthesis of Coronatine , 2004, Journal of bacteriology.

[105]  C. Walsh,et al.  Assembling the glycopeptide antibiotic scaffold: The biosynthesis of from Streptomyces toyocaensis NRRL15009 , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[106]  J. Badger,et al.  The Natural Product Domain Seeker NaPDoS: A Phylogeny Based Bioinformatic Tool to Classify Secondary Metabolite Gene Diversity , 2012, PloS one.

[107]  K. Ishida,et al.  Orchestration of discoid polyketide cyclization in the resistomycin pathway. , 2008, Journal of the American Chemical Society.

[108]  P. Leadlay,et al.  The biosynthetic gene cluster for the polyketide immunosuppressant rapamycin. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[109]  R. Reid,et al.  Chalcomycin Biosynthesis Gene Cluster from Streptomyces bikiniensis: Novel Features of an Unusual Ketolide Produced through Expression of the chm Polyketide Synthase in Streptomyces fradiae , 2004, Antimicrobial Agents and Chemotherapy.

[110]  J. Piel Biosynthesis of polyketides by trans-AT polyketide synthases. , 2010, Natural product reports.

[111]  Kai Blin,et al.  NRPSpredictor2—a web server for predicting NRPS adenylation domain specificity , 2011, Nucleic Acids Res..

[112]  B. Shen,et al.  The biosynthetic gene cluster of zorbamycin, a member of the bleomycin family of antitumor antibiotics, from Streptomyces flavoviridis ATCC 21892. , 2009, Molecular bioSystems.

[113]  D. Andrews,et al.  Chemical and biosynthetic evolution of the antimycin-type depsipeptides. , 2013, Molecular bioSystems.

[114]  Nathan Brown,et al.  Chemoinformatics—an introduction for computer scientists , 2009, CSUR.

[115]  J. Zucko,et al.  ClustScan: an integrated program package for the semi-automatic annotation of modular biosynthetic gene clusters and in silico prediction of novel chemical structures , 2008, Nucleic acids research.

[116]  Alan L Harvey,et al.  Natural products in drug discovery. , 2008, Drug discovery today.