Propensity vectors of low‐ASA residue pairs in the distinction of protein interactions

We introduce low‐ASA residue pairs as classification features for distinguishing the different types of protein interactions. A low‐ASA residue pair is defined as two contact residues each from one chain that have a small solvent accessible surface area (ASA). This notion of residue pairs is novel as it first combines residue pairs with the O‐ring theory, an influential proposition stating that the binding hot spots at the interface are often surrounded by a ring of energetically less important residues. As binding hot spots lie in the core of the stability for protein interactions, we believe that low‐ASA residue pairs can sharpen the distinction of protein interactions. The main part of our feature vector is 210‐dimensional, consisting of all possible low‐ASA residue pairs; the value of every feature is determined by a propensity measure. Our classification method is called OringPV, which uses propensity vectors of protein interactions for support vector machine. OringPV is tested on three benchmark datasets for a variety of classification tasks such as the distinction between crystal packing and biological interactions, the distinction between two different types of biological interactions, etc. The evaluation frameworks include within‐dataset, cross‐dataset comparison, and leave‐one‐out cross‐validation. The results show that low‐ASA residue pairs and the propensity vector description of protein interactions are truly strong in the distinction. In particular, many cross‐dataset generalization capability tests have achieved excellent recalls and overall accuracies, much outperforming existing benchmark methods. Proteins 2010. © 2009 Wiley‐Liss, Inc.

[1]  Janet M. Thornton,et al.  Automatic inference of protein quaternary structure from crystals , 2003 .

[2]  Luhua Lai,et al.  A combinatorial score to distinguish biological and nonbiological protein–protein interfaces , 2006, Proteins.

[3]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[4]  Julie Bernauer,et al.  DiMoVo: a Voronoi tessellation-based method for discriminating crystallographic and biological protein-protein interactions , 2008, Bioinform..

[5]  Vincent Noireaux,et al.  Toward an artificial cell based on gene expression in vesicles , 2005, Physical biology.

[6]  G. Vriend,et al.  Molecular docking using surface complementarity , 1996, Proteins.

[7]  R Nussinov,et al.  Protein folding via binding and vice versa. , 1998, Folding & design.

[8]  J. Thornton,et al.  Diversity of protein–protein interactions , 2003, The EMBO journal.

[9]  N. Ben-Tal,et al.  Residue frequencies and pairing preferences at protein–protein interfaces , 2001, Proteins.

[10]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[11]  Nathalie Japkowicz,et al.  Concept learning in the absence of counterexamples: an autoassociation-based approach to classification , 1999 .

[12]  P. Baldi,et al.  Prediction of coordination number and relative solvent accessibility in proteins , 2002, Proteins.

[13]  R. Raz,et al.  ProMate: a structure based prediction program to identify the location of protein-protein binding sites. , 2004, Journal of molecular biology.

[14]  Pinak Chakrabarti,et al.  Interresidue contacts in proteins and protein-protein interfaces and their use in characterizing the homodimeric interface. , 2005, Journal of proteome research.

[15]  J. Janin,et al.  Dissecting subunit interfaces in homodimeric proteins , 2003, Proteins.

[16]  Francis Rodier,et al.  Protein–protein interaction at crystal contacts , 1995, Proteins.

[17]  C Chothia,et al.  Surface, subunit interfaces and interior of oligomeric proteins. , 1988, Journal of molecular biology.

[18]  J M Thornton,et al.  Conservation helps to identify biologically relevant crystal contacts. , 2001, Journal of molecular biology.

[19]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[20]  Joël Janin,et al.  Specific versus non-specific contacts in protein crystals , 1997, Nature Structural Biology.

[21]  David R. Westhead,et al.  Improved prediction of protein-protein binding sites using a support vector machines approach. , 2005, Bioinformatics.

[22]  Z. Weng,et al.  Atomic contact vectors in protein‐protein recognition , 2003, Proteins.

[23]  Oliviero Carugo,et al.  Protein—protein crystal‐packing contacts , 1997, Protein science : a publication of the Protein Society.

[24]  J. Janin,et al.  A dissection of specific and non-specific protein-protein interfaces. , 2004, Journal of molecular biology.

[25]  Eyke Hüllermeier,et al.  Physicochemical descriptors to discriminate protein–protein interactions in permanent and transient complexes selected by means of machine learning algorithms , 2006, Proteins.

[26]  Ozlem Keskin,et al.  A survey of available tools and web servers for analysis of protein-protein interactions and interfaces , 2008, Briefings Bioinform..

[27]  W. Delano Unraveling hot spots in binding interfaces: progress and challenges. , 2002, Current opinion in structural biology.

[28]  C. Chothia,et al.  The atomic structure of protein-protein recognition sites. , 1999, Journal of molecular biology.

[29]  Jérôme Azé,et al.  A new protein-protein docking scoring function based on interface residue properties , 2007, Bioinform..

[30]  C. DeLisi,et al.  Determination of atomic desolvation energies from the structures of crystallized proteins. , 1997, Journal of molecular biology.

[31]  H. Wolfson,et al.  Protein-Protein Interactions: Coupling of Structurally Conserved Residues and of Hot Spots across Interfaces. Implications for Docking , 2004 .

[32]  D. Covell,et al.  A role for surface hydrophobicity in protein‐protein recognition , 1994, Protein science : a publication of the Protein Society.

[33]  Sarah A. Teichmann,et al.  Principles of protein-protein interactions , 2002, ECCB.

[34]  Jinyan Li,et al.  Interacting Amino Acid Preferences of 3D Pattern Pairs at the Binding Sites of Transient and Obligate Protein Complexes , 2007, APBC.

[35]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[36]  Pedro A Fernandes,et al.  Hot spots—A review of the protein–protein interface determinant amino‐acid residues , 2007, Proteins.

[37]  Jinyan Li,et al.  ‘Double water exclusion’: a hypothesis refining the O-ring theory for the hot spots at protein interfaces , 2009, Bioinform..

[38]  A. Bogan,et al.  Anatomy of hot spots in protein interfaces. , 1998, Journal of molecular biology.

[39]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[40]  Hongbo Zhu,et al.  NOXclass: prediction of protein-protein interaction types , 2006, BMC Bioinformatics.

[41]  Jérôme Azé,et al.  A docking analysis of the statistical physics of protein–protein recognition , 2005, Physical biology.

[42]  M J Sternberg,et al.  Use of pair potentials across protein interfaces in screening predicted docked complexes , 1999, Proteins.

[43]  Kurt S. Thorn,et al.  ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions , 2001, Bioinform..

[44]  J. Thornton,et al.  Discriminating between homodimeric and monomeric proteins in the crystalline state , 2000, Proteins.

[45]  B. Rost,et al.  Analysing six types of protein-protein interfaces. , 2003, Journal of molecular biology.