CoABind: a novel algorithm for Coenzyme A (CoA)‐ and CoA derivatives‐binding residues prediction

Motivation: Coenzyme A (CoA)‐protein binding plays an important role in various cellular functions and metabolic pathways. However, no computational methods can be employed for CoA‐binding residues prediction. Results: We developed three methods for the prediction of CoA‐ and CoA derivatives‐binding residues, including an ab initio method SVMpred, a template‐based method TemPred and a consensus‐based method CoABind. In SVMpred, a comprehensive set of features are designed from two complementary sequence profiles and the predicted secondary structure and solvent accessibility. The engine for classification in SVMpred is selected as the support vector machine. For TemPred, the prediction is transferred from homologous templates in the training set, which are detected by the program HHsearch. The assessment on an independent test set consisting of 73 proteins shows that SVMpred and TemPred achieve Matthews correlation coefficient (MCC) of 0.438 and 0.481, respectively. Analysis on the predictions by SVMpred and TemPred shows that these two methods are complementary to each other. Therefore, we combined them together, forming the third method CoABind, which further improves the MCC to 0.489 on the same set. Experiments demonstrate that the proposed methods significantly outperform the state‐of‐the‐art general‐purpose ligand‐binding residues prediction algorithm COACH. As the first‐of‐its‐kind method, we anticipate CoABind to be helpful for studying CoA‐protein interaction. Availability and implementation: http://yanglab.nankai.edu.cn/CoABind Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Alan Wee-Chung Liew,et al.  Sequence‐based prediction of protein–peptide binding sites using support vector machine , 2016, J. Comput. Chem..

[2]  Yang Zhang,et al.  BioLiP: a semi-manually curated database for biologically relevant ligand–protein interactions , 2012, Nucleic Acids Res..

[3]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[4]  Lukasz A. Kurgan,et al.  DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences , 2016, Bioinform..

[5]  Yang Zhang,et al.  Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment , 2013, Bioinform..

[6]  Lukasz A. Kurgan,et al.  A comprehensive comparative review of sequence-based predictors of DNA- and RNA-binding residues , 2016, Briefings Bioinform..

[7]  Yang Zhang,et al.  The I-TASSER Suite: protein structure and function prediction , 2014, Nature Methods.

[8]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[9]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[10]  Tuo Zhang,et al.  Analysis and prediction of RNA-binding residues using sequence, evolutionary conservation, and predicted secondary structure and solvent accessibility. , 2010, Current protein & peptide science.

[11]  Lukasz Kurgan,et al.  High-throughput prediction of RNA, DNA and protein binding regions mediated by intrinsic disorder , 2015, Nucleic acids research.

[12]  E. Strauss,et al.  Coenzyme A: to make it or uptake it? , 2016, Nature Reviews Molecular Cell Biology.

[13]  Kuldip K. Paliwal,et al.  Capturing non‐local interactions by long short‐term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility , 2017, Bioinform..

[14]  Lukasz Kurgan,et al.  DRNApred, fast sequence-based method that accurately predicts and discriminates DNA- and RNA-binding residues , 2017, Nucleic acids research.

[15]  R. Wierenga,et al.  The diverse world of coenzyme A binding proteins. , 1996, Current opinion in structural biology.

[16]  Lukasz A. Kurgan,et al.  PFRES: protein fold classification by using evolutionary information and predicted secondary structure , 2007, Bioinform..

[17]  David S. Goodsell,et al.  The RCSB protein data bank: integrative view of protein, gene and 3D structural information , 2016, Nucleic Acids Res..

[18]  F. Lipmann,et al.  ACETYLATION OF SULFANILAMIDE BY LIVER HOMOGENATES AND EXTRACTS , 1945 .

[19]  A. B. Robinson,et al.  Distribution of glutamine and asparagine residues and their near neighbors in peptides and proteins. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Timothy M Rose,et al.  Evolution of the acyl-CoA binding protein (ACBP). , 2005, The Biochemical journal.

[21]  Hongbo Mu,et al.  An ensemble approach to protein fold classification by integration of template‐based assignment and support vector machine classifier , 2016, Bioinform..

[22]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.