Prediction of protein binding sites using physical and chemical descriptors and the support vector machine regression method

In this paper a new continuous variable called core-ratio is deflned to describe the probability for a residue to be in a binding site, thereby replacing the previous binary description of the interface residue using 0 and 1. So we can use the support vector machine regression method to flt the core-ratio value and predict the protein binding sites. We also design a new group of physical and chemical descriptors to characterize the binding sites. The new descriptors are more efiective, with an averaging procedure used. Our test shows that much better prediction results can be obtained by the support vector regression (SVR) method than by the support vector classiflcation method.

[1]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[2]  J. Janin,et al.  Dissecting protein–protein recognition sites , 2002, Proteins.

[3]  Liu Xin,et al.  A combined statistical model for multiple motifs search , 2008 .

[4]  Liu Jing-fa Structure optimization by heuristic algorithm in a coarse-grained off-lattice model , 2009 .

[5]  M. Sternberg,et al.  Prediction of protein-protein interactions by docking methods. , 2002, Current opinion in structural biology.

[6]  Fan Jiang,et al.  Prediction of protein-protein binding site by using core interface residue and support vector machine , 2008, BMC Bioinformatics.

[7]  Huan-Xiang Zhou,et al.  Prediction of interface residues in protein–protein complexes by a consensus neural network method: Test against NMR data , 2005, Proteins.

[8]  B. Lee,et al.  The interpretation of protein structures: estimation of static accessibility. , 1971, Journal of molecular biology.

[9]  R. Nussinov,et al.  Conservation of polar residues as hot spots at protein interfaces , 2000, Proteins.

[10]  Zhao You-yuan,et al.  Anisotropic spectral holes in orangic PHB materials , 1994 .

[11]  A. Bonvin,et al.  WHISCY: What information does surface conservation yield? Application to data‐driven docking , 2006, Proteins.

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[14]  N. Ben-Tal,et al.  ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. , 2001, Journal of molecular biology.

[15]  J. Janin,et al.  Dissecting subunit interfaces in homodimeric proteins , 2003, Proteins.

[16]  江凡,et al.  Protein structural codes and nucleation sites for protein folding , 2007 .

[17]  R. Nussinov,et al.  Protein–protein interactions: Structurally conserved residues distinguish between binding sites and exposed protein surfaces , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Collaborative Computational,et al.  The CCP4 suite: programs for protein crystallography. , 1994, Acta crystallographica. Section D, Biological crystallography.

[19]  Jiang Fan,et al.  Protein structural codes and nucleation sites for protein folding , 2007 .

[20]  C Chothia,et al.  Surface, subunit interfaces and interior of oligomeric proteins. , 1988, Journal of molecular biology.

[21]  Shen Yu,et al.  Structural statistical properties of knotted proteins , 2009 .