DDAP: docking domain affinity and biosynthetic pathway prediction tool for type I polyketide synthases

Summary DDAP is a tool for predicting the biosynthetic pathways of the products of type I modular polyketide synthase (PKS) with the focus on providing a more accurate prediction of the ordering of proteins and substrates in the pathway. In this study, the module docking domain (DD) affinity prediction performance on a hold-out testing data set reached AUC = 0.88; the MRR of pathway prediction reached 0.67. DDAP has advantages compared to previous informatics tools in several aspects: (i) it does not rely on large databases, making it a high efficiency tool, (ii) the predicted DD affinity is represented by a probability (0 to 1), which is more intuitive than raw scores, (iii) its performance is competitive compared to the current popular rule-based algorithm. To the best of our knowledge, DDAP is so far the first machine learning based algorithm for type I PKS pathway prediction. We also established the first database of type I modular PKSs, featuring a comprehensive annotation of available docking domains information in bacterial biosynthetic pathways. Availability and implementation The DDAP database is available at https://tylii.github.io/ddap. The prediction algorithm DDAP is freely available on GitHub (https://github.com/tylii/ddap) and released under the MIT license. Contact ukarvind@umich.edu

[1]  Yoram Burak,et al.  The Origins of Specificity in Polyketide Synthase Protein Interactions , 2007, PLoS Comput. Biol..

[2]  Stefan Günther,et al.  SeMPI: a genome-based secondary metabolite prediction and identification web server , 2017, Nucleic Acids Res..

[3]  Georgios Skiniotis,et al.  Structure of a modular polyketide synthase , 2014, Nature.

[4]  C. Khosla,et al.  Role of linkers in communication between protein modules. , 2000, Current opinion in chemical biology.

[5]  Peter Man-Un Ung,et al.  Automated genome mining for natural products , 2009, BMC Bioinformatics.

[6]  D. Cane,et al.  Dissecting and exploiting intermodular communication in polyketide synthases. , 1999, Science.

[7]  Kai Blin,et al.  antiSMASH 4.0—improvements in chemistry prediction and gene cluster boundary identification , 2017, Nucleic Acids Res..

[8]  Carla S. Jones,et al.  Minimum Information about a Biosynthetic Gene cluster. , 2015, Nature chemical biology.

[9]  Michael A. Skinnider,et al.  PRISM 3: expanded prediction of natural product chemical structures from microbial genomes , 2017, Nucleic Acids Res..

[10]  Richard H. Baltz,et al.  Natural product discovery: past, present, and future , 2016, Journal of Industrial Microbiology & Biotechnology.

[11]  Gitanjali Yadav,et al.  Towards Prediction of Metabolic Products of Polyketide Synthases: An In Silico Analysis , 2009, PLoS Comput. Biol..

[12]  D. Newman,et al.  Natural Products as Sources of New Drugs from 1981 to 2014. , 2016, Journal of natural products.