论文信息 - Identification of ligand-binding residues using protein sequence profile alignment and query-specific support vector machine model.

Identification of ligand-binding residues using protein sequence profile alignment and query-specific support vector machine model.

Abstract Information embedded in ligand-binding residues (LBRs) of proteins is important for understanding protein functions. How to accurately identify the potential ligand-binding residues is still a challenging problem, especially only protein sequence is given. In this paper, we establish a new query-specific computational method, named I-LBR, for the identification of LBRs without directly using the information of protein 3D structure. I-LBR includes two modes, named as I-LBRGP and I-LBRLS, for the general-purpose and ligand-specific LBR identification. For both modes, I-LBR first construct the specific training subset based on the query sequence information; then use support vector machine (SVM) algorithm to learn the LBR identification model; finally, predict the probability of each residue in query protein belongs to the class of LBR. Experimental results on four testing dataset demonstrate that I-LBRLS is the better choice against I-LBRGP, when the ligand type/types of the query protein binds is/are known. Comparing to other state-of-the-art LBR identification methods, I-LBR can achieve a better or comparable performance. The web-server of I-LBR and dataset used in this study are freely available for academic use at https://jun-csbio.github.io/I-LBR .

Hu Jun | Zhang Guijun | Rao Liang | Fan Xueqiang

[1] S. B. Needleman,et al. A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[2] Kuo-Chen Chou. Proposing 5-Steps Rule Is a Notable Milestone for Studying Molecular Biology , 2020 .

[3] Bernard Kamsu-Foguem,et al. Deep convolution neural network for image recognition , 2018, Ecol. Informatics.

[4] Mona Singh,et al. Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure , 2009, PLoS Comput. Biol..

[5] Pufeng Du,et al. PseAAC-General: Fast Building Various Modes of General Form of Chou’s Pseudo-Amino Acid Composition for Large-Scale Protein Datasets , 2014, International journal of molecular sciences.

[6] R. Abagyan,et al. Pocketome via Comprehensive Identification and Classification of Ligand Binding Envelopes* , 2005, Molecular & Cellular Proteomics.

[7] J. Skolnick,et al. A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation , 2008, Proceedings of the National Academy of Sciences.

[8] Jing-Yu Yang,et al. A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction , 2014, PloS one.

[9] K. Chou,et al. iPGK-PseAAC: Identify Lysine Phosphoglycerylation Sites in Proteins by Incorporating Four Different Tiers of Amino Acid Pairwise Coupling Information into the General PseAAC. , 2017, Medicinal chemistry (Shariqah (United Arab Emirates)).

[10] David S. Goodsell,et al. The RCSB protein data bank: integrative view of protein, gene and 3D structural information , 2016, Nucleic Acids Res..

[11] Lukasz Kurgan,et al. ATPsite: sequence-based prediction of ATP-binding residues , 2011, Proteome Science.

[12] Hong-Bin Shen,et al. Protein-ligand binding residue prediction enhancement through hybrid deep heterogeneous learning of sequence and structure data , 2020, Bioinform..

[13] K. Chou. Advance in predicting subcellular localization of multi-label proteins and its implication for developing multi-target drugs. , 2019, Current medicinal chemistry.

[14] R. Laskowski. SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions. , 1995, Journal of molecular graphics.

[15] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .

[16] Yang Zhang,et al. The I-TASSER Suite: protein structure and function prediction , 2014, Nature Methods.

[17] Gajendra P. S. Raghava,et al. Identification of ATP binding residues of a protein from its primary sequence , 2009, BMC Bioinformatics.

[18] J. Skolnick,et al. FINDSITE‐metal: Integrating evolutionary information and machine learning for structure‐based metal‐binding site prediction at the proteome level , 2011, Proteins.

[19] Jun Hu,et al. Designing Template-Free Predictor for Targeting Protein-Ligand Binding Sites with Classifier Ensemble and Spatial Clustering , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20] Lukasz A. Kurgan,et al. Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors , 2012, Bioinform..

[21] Liam J. McGuffin,et al. FunFOLD: an improved automated method for the prediction of ligand binding residues using 3D models of proteins , 2011, BMC Bioinformatics.

[22] Jianjun Hu,et al. HemeBIND: a novel method for heme binding residue prediction by combining structural and sequence information , 2011, BMC Bioinformatics.

[23] K. Chou. Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology , 2009 .

[24] S. Forsén,et al. Graphical rules for enzyme-catalysed rate laws. , 1980, The Biochemical journal.

[25] G. Zhou,et al. An extension of Chou's graphic rules for deriving enzyme kinetic equations to systems involving parallel reaction pathways. , 1984, The Biochemical journal.

[26] Jun Hu,et al. Constructing Query-Driven Dynamic Machine Learning Model With Application to Protein-Ligand Binding Sites Prediction , 2015, IEEE Transactions on NanoBioscience.

[27] Keehyoung Joo,et al. proteins STRUCTURE O FUNCTION O BIOINFORMATICS SANN: Solvent accessibility prediction of proteins , 2022 .

[28] Yang Zhang,et al. I-TASSER server for protein 3D structure prediction , 2008, BMC Bioinformatics.

[29] R. Wade,et al. Computational approaches to identifying and characterizing protein binding sites for ligand design , 2009, Journal of molecular recognition : JMR.

[30] Yang Zhang,et al. How significant is a protein structure similarity with TM-score = 0.5? , 2010, Bioinform..

[31] Yang Li,et al. Predicting Protein-DNA Binding Residues by Weightedly Combining Sequence-Based Features and Boosting Multiple SVMs , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32] Kuo-Chen Chou,et al. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[33] Michael J. E. Sternberg,et al. 3DLigandSite: predicting ligand-binding sites using similar structures , 2010, Nucleic Acids Res..

[34] S. Henikoff,et al. Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[35] Yang Zhang,et al. Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment , 2013, Bioinform..

[36] Michael Schroeder,et al. MetaDBSite: a meta approach to improve protein DNA-binding sites prediction , 2011, BMC Systems Biology.

[37] Kuo-Chen Chou. Other Mountain Stones Can Attack Jade: The 5-Steps Rule , 2020 .

[38] Itay Mayrose,et al. Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues , 2002, ISMB.

[39] Jun Hu,et al. DNAPred: Accurate Identification of DNA-Binding Sites from Protein Sequence by Ensembled Hyperplane-Distance-Based Support Vector Machines , 2019, J. Chem. Inf. Model..

[40] Yang Zhang,et al. COFACTOR: an accurate comparative algorithm for structure-based protein function annotation , 2012, Nucleic Acids Res..

[41] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[42] K. Chou. Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[43] M Hendlich,et al. LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. , 1997, Journal of molecular graphics & modelling.

[44] Tatiana Tatusova,et al. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[45] Yang Zhang,et al. BioLiP: a semi-manually curated database for biologically relevant ligand–protein interactions , 2012, Nucleic Acids Res..

[46] David Baker,et al. Advances in Rosetta protein structure prediction on massively parallel systems , 2008, IBM J. Res. Dev..