Classification of the Adenylation and Acyl-Transferase Activity of NRPS and PKS Systems Using Ensembles of Substrate Specific Hidden Markov Models

There is a growing interest in the Non-ribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs) of microbes, fungi and plants because they can produce bioactive peptides such as antibiotics. The ability to identify the substrate specificity of the enzyme's adenylation (A) and acyl-transferase (AT) domains is essential to rationally deduce or engineer new products. We here report on a Hidden Markov Model (HMM)-based ensemble method to predict the substrate specificity at high quality. We collected a new reference set of experimentally validated sequences. An initial classification based on alignment and Neighbor Joining was performed in line with most of the previously published prediction methods. We then created and tested single substrate specific HMMs and found that their use improved the correct identification significantly for A as well as for AT domains. A major advantage of the use of HMMs is that it abolishes the dependency on multiple sequence alignment and residue selection that is hampering the alignment-based clustering methods. Using our models we obtained a high prediction quality for the substrate specificity of the A domains similar to two recently published tools that make use of HMMs or Support Vector Machines (NRPSsp and NRPS predictor2, respectively). Moreover, replacement of the single substrate specific HMMs by ensembles of models caused a clear increase in prediction quality. We argue that the superiority of the ensemble over the single model is caused by the way substrate specificity evolves for the studied systems. It is likely that this also holds true for other protein domains. The ensemble predictor has been implemented in a simple web-based tool that is available at http://www.cmbi.ru.nl/NRPS-PKS-substrate-predictor/.

[1]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[2]  Robert Finking,et al.  Biosynthesis of nonribosomal peptides , 2003 .

[3]  D. Sinderen,et al.  Sequence and analysis of the genetic locus responsible for surfactin synthesis in Bacillus subtilis , 1993, Molecular microbiology.

[4]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[5]  H. Reichenbach,et al.  Isolation and total synthesis of icumazoles and noricumazoles--antifungal antibiotics and cation-channel blockers from Sorangium cellulosum. , 2012, Angewandte Chemie.

[6]  Mohamed A. Marahiel,et al.  Modular Peptide Synthetases Involved in Nonribosomal Peptide Synthesis. , 1997, Chemical reviews.

[7]  G. Challis,et al.  Predictive, structure-based model of amino acid recognition by nonribosomal peptide synthetase adenylation domains. , 2000, Chemistry & biology.

[8]  J. Spencer,et al.  Biological chemistry: Enzymes line up for assembly , 2007, Nature.

[9]  Anna Lechner,et al.  Biosynthesis of the salinosporamide A polyketide synthase substrate chloroethylmalonyl-coenzyme A from S-adenosyl-l-methionine , 2009, Proceedings of the National Academy of Sciences.

[10]  The UniProt Consortium,et al.  Reorganizing the protein space at the Universal Protein Resource (UniProt) , 2011, Nucleic Acids Res..

[11]  Peter Man-Un Ung,et al.  Automated genome mining for natural products , 2009, BMC Bioinformatics.

[12]  Gitanjali Yadav,et al.  SBSPKS: structure based sequence analysis of polyketide synthases , 2010, Nucleic Acids Res..

[13]  R. Siezen,et al.  Natural products genomics , 2008, Microbial biotechnology.

[14]  Z Dauter,et al.  The Escherichia coli Malonyl-CoA:Acyl Carrier Protein Transacylase at 1.5-Å Resolution. , 1995, The Journal of Biological Chemistry.

[15]  Rolf Müller,et al.  Evolutionary implications of bacterial polyketide synthases. , 2005, Molecular biology and evolution.

[16]  Gitanjali Yadav,et al.  NRPS-PKS: a knowledge-based resource for analysis of NRPS/PKS megasynthases , 2004, Nucleic Acids Res..

[17]  C. Hertweck,et al.  The biosynthetic logic of polyketide diversity. , 2009, Angewandte Chemie.

[18]  C. Hertweck,et al.  Genomics-inspired discovery of natural products. , 2011, Current opinion in chemical biology.

[19]  Kira J. Weissman,et al.  Protein—Protein Interactions in Multienzyme Megasynthetases , 2008 .

[20]  Christopher T Walsh,et al.  Polyketide and Nonribosomal Peptide Antibiotics: Modularity and Versatility , 2004, Science.

[21]  D. Baker,et al.  The value of natural products to future pharmaceutical discovery. , 2007, Natural product reports.

[22]  Minoru Kanehisa,et al.  Comprehensive analysis of distinctive polyketide and nonribosomal peptide structural motifs encoded in microbial genomes. , 2007, Journal of molecular biology.

[23]  I. Hoof,et al.  CLUSEAN: a computer-based framework for the automated analysis of bacterial secondary metabolite biosynthetic gene clusters. , 2009, Journal of biotechnology.

[24]  R. Tenreiro,et al.  Diversity and Impact of Prokaryotic Toxins on Aquatic Environments: A Review , 2010, Toxins.

[26]  Geoffrey J. Barton,et al.  The Jalview Java alignment editor , 2004, Bioinform..

[27]  T. Stein,et al.  Structural and functional organization of the fengycin synthetase multienzyme system from Bacillus subtilis b213 and A1/3. , 1999, Chemistry & biology.

[28]  A. Brakhage,et al.  Identification of the novel penicillin biosynthesis gene aatB of Aspergillus nidulans and its putative evolutionary relationship to this fungal secondary metabolism gene cluster , 2008, Molecular microbiology.

[29]  Jacques Ravel,et al.  Chapter 8. Methods for in silico prediction of microbial polyketide and nonribosomal peptide biosynthetic pathways from DNA sequence data. , 2009, Methods in enzymology.

[30]  Chaitan Khosla,et al.  Structures and mechanisms of polyketide synthases. , 2009, The Journal of organic chemistry.

[31]  Christian P. Ridley,et al.  Evolution of polyketide synthases in bacteria , 2008, Proceedings of the National Academy of Sciences.

[32]  Kai Blin,et al.  NRPSpredictor2—a web server for predicting NRPS adenylation domain specificity , 2011, Nucleic Acids Res..

[33]  Rolf Müller,et al.  Formation of novel secondary metabolites by bacterial multimodular assembly lines: deviations from textbook biosynthetic logic. , 2005, Current opinion in chemical biology.

[34]  B. Shen,et al.  The biosynthetic gene cluster of zorbamycin, a member of the bleomycin family of antitumor antibiotics, from Streptomyces flavoviridis ATCC 21892. , 2009, Molecular bioSystems.

[35]  Gitanjali Yadav,et al.  Computational approach for prediction of domain organization and substrate specificity of modular polyketide synthases. , 2003, Journal of molecular biology.

[36]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[37]  Wolfgang Schmidt-Heck,et al.  Intimate bacterial–fungal interaction triggers biosynthesis of archetypal polyketides in Aspergillus nidulans , 2009, Proceedings of the National Academy of Sciences.

[38]  K. Katoh,et al.  MAFFT version 5: improvement in accuracy of multiple sequence alignment , 2005, Nucleic acids research.

[39]  G. Challis,et al.  Substrate recognition by nonribosomal peptide synthetase multi-enzymes. , 2004, Microbiology.

[40]  Kiejung Park,et al.  ASMPKS: an analysis system for modular polyketide synthases , 2007, BMC Bioinformatics.

[41]  Margherita Sosio,et al.  Polyketide synthases and nonribosomal peptide synthetases: the emerging view from bacterial genomics. , 2007, Natural product reports.

[42]  J. Silberg,et al.  A transposase strategy for creating libraries of circularly permuted proteins , 2012, Nucleic acids research.

[43]  J. Thompson,et al.  Multiple sequence alignment with Clustal X. , 1998, Trends in biochemical sciences.

[44]  P. Brick,et al.  Structural basis for the activation of phenylalanine in the non‐ribosomal biosynthesis of gramicidin S , 1997, The EMBO journal.

[45]  C. Khosla,et al.  Combinatorial biosynthesis of polyketides--a perspective. , 2012, Current opinion in chemical biology.

[46]  Berend Snel,et al.  Orthology prediction at scalable resolution by phylogenetic tree analysis , 2007, BMC Bioinformatics.

[47]  Tilmann Weber,et al.  Specificity prediction of adenylation domains in nonribosomal peptide synthetases (NRPS) using transductive support vector machines (TSVMs) , 2005, Nucleic acids research.

[48]  M. Marahiel,et al.  Molecular mechanisms underlying nonribosomal peptide synthesis: approaches to new antibiotics. , 2005, Chemical reviews.

[49]  M. Marahiel,et al.  Four homologous domains in the primary structure of GrsB are related to domains in a superfamily of adenylate‐forming enzymes , 1992, Molecular microbiology.

[50]  T. Stachelhaus,et al.  The specificity-conferring code of adenylation domains in nonribosomal peptide synthetases. , 1999, Chemistry & biology.

[51]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[52]  I. Abe,et al.  Cytotoxic tetramic acid derivative produced by a plant type-III polyketide synthase. , 2011, Journal of the American Chemical Society.

[53]  C. Walsh,et al.  Tailoring enzymes that modify nonribosomal peptides during and after chain elongation on NRPS assembly lines. , 2001, Current opinion in chemical biology.

[54]  Daniel H. Huson,et al.  Dendroscope: An interactive viewer for large phylogenetic trees , 2007, BMC Bioinformatics.

[55]  R. Müller,et al.  Insights into the complex biosynthesis of the leupyrrins in Sorangium cellulosum So ce690. , 2011, Molecular bioSystems.

[56]  Carlos Prieto,et al.  NRPSsp: non-ribosomal peptide synthase substrate predictor , 2012, Bioinform..

[57]  C. Hertweck,et al.  Anaerobic bacteria as producers of antibiotics , 2012, Applied Microbiology and Biotechnology.

[58]  W. Saurin,et al.  Streptogramin B biosynthesis in Streptomyces pristinaespiralis and Streptomyces virginiae: molecular characterization of the last structural peptide synthetase gene , 1997, Antimicrobial agents and chemotherapy.