Use B-factor related features for accurate classification between protein binding interfaces and crystal packing contacts

BackgroundDistinction between true protein interactions and crystal packing contacts is important for structural bioinformatics studies to respond to the need of accurate classification of the rapidly increasing protein structures. There are many unannotated crystal contacts and there also exist false annotations in this rapidly expanding volume of data. Previous tools have been proposed to address this problem. However, challenging issues still remain, such as low performance when the training and test data contain mixed interfaces having diverse sizes of contact areas.Methods and resultsB factor is a measure to quantify the vibrational motion of an atom, a more relevant feature than interface size to characterize protein binding. We propose to use three features related to B factor for the classification between biological interfaces and crystal packing contacts. The first feature is the sum of the normalized B factors of the interfacial atoms in the contact area, the second is the average of the interfacial B factor per residue in the chain, and the third is the average number of interfacial atoms with a negative normalized B factor per residue in the chain. We investigate the distribution properties of these basic features and a compound feature on four datasets of biological binding and crystal packing, and on a protein binding-only dataset with known binding affinity. We also compare the cross-dataset classification performance of these features with existing methods and with a widely-used and the most effective feature interface area. The results demonstrate that our features outperform the interface area approach and the existing prediction methods remarkably for many tests on all of these datasets.ConclusionsThe proposed B factor related features are more effective than interface area to distinguish crystal packing from biological binding interfaces. Our computational methods have a potential for large-scale and accurate identification of biological interactions from the experimentally determined structural data stored at PDB which may have diverse interface sizes.

[1]  Jinyan Li,et al.  Integrating water exclusion theory into βcontacts to predict binding free energy changes and binding hot spots , 2014, BMC Bioinformatics.

[2]  K. Henrick,et al.  Inference of macromolecular assemblies from crystalline state. , 2007, Journal of molecular biology.

[3]  Julie Bernauer,et al.  DiMoVo: a Voronoi tessellation-based method for discriminating crystallographic and biological protein-protein interactions , 2008, Bioinform..

[4]  P. Debye The Crystalline State , 1934, Nature.

[5]  Hongbo Zhu,et al.  NOXclass: prediction of protein-protein interaction types , 2006, BMC Bioinformatics.

[6]  Juliette Martin,et al.  Benchmarking protein–protein interface predictions: Why you should care about protein size , 2014, Proteins.

[7]  Oliviero Carugo,et al.  Protein—protein crystal‐packing contacts , 1997, Protein science : a publication of the Protein Society.

[8]  Zheng Yuan,et al.  Prediction of protein B‐factor profiles , 2005, Proteins.

[9]  J M Thornton,et al.  Conservation helps to identify biologically relevant crystal contacts. , 2001, Journal of molecular biology.

[10]  Eyke Hüllermeier,et al.  Physicochemical descriptors to discriminate protein–protein interactions in permanent and transient complexes selected by means of machine learning algorithms , 2006, Proteins.

[11]  Z. Weng,et al.  Atomic contact vectors in protein‐protein recognition , 2003, Proteins.

[12]  Adam Srebniak,et al.  Protein interface classification by evolutionary analysis , 2012, BMC Bioinformatics.

[13]  Francis Rodier,et al.  Protein–protein interaction at crystal contacts , 1995, Proteins.

[14]  S. Parthasarathy,et al.  Analysis of temperature factor distribution in high‐resolution protein structures , 1997, Protein science : a publication of the Protein Society.

[15]  Joël Janin,et al.  Specific versus non-specific contacts in protein crystals , 1997, Nature Structural Biology.

[16]  Ozlem Keskin,et al.  A survey of available tools and web servers for analysis of protein-protein interactions and interfaces , 2008, Briefings Bioinform..

[17]  C. Chothia,et al.  The atomic structure of protein-protein recognition sites. , 1999, Journal of molecular biology.

[18]  J. Thornton,et al.  PQS: a protein quaternary structure file server. , 1998, Trends in biochemical sciences.

[19]  J. Thornton,et al.  Discriminating between homodimeric and monomeric proteins in the crystalline state , 2000, Proteins.

[20]  J. Janin,et al.  Dissecting subunit interfaces in homodimeric proteins , 2003, Proteins.

[21]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[22]  Zheng Yuan,et al.  Flexibility analysis of enzyme active sites by crystallographic temperature factors. , 2003, Protein engineering.

[23]  Jinyan Li,et al.  Binding Affinity Prediction for Protein-Ligand Complexes Based on β Contacts and B Factor , 2013, J. Chem. Inf. Model..

[24]  Steven C. H. Hoi,et al.  Beta Atomic Contacts: Identifying Critical Specific Contacts in Protein Binding Interfaces , 2013, PloS one.

[25]  J C Sacchettini,et al.  Regulation of product chain length by isoprenyl diphosphate synthases. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[26]  D. Kirkpatrick,et al.  A Framework for Computational Morphology , 1985 .

[27]  Zhihai Liu,et al.  Comparative Assessment of Scoring Functions on a Diverse Test Set , 2009, J. Chem. Inf. Model..

[28]  Gaohua Liu,et al.  NMR structure of F‐actin‐binding domain of Arg/Abl2 from Homo sapiens , 2010, Proteins.

[29]  J. Janin,et al.  A dissection of specific and non-specific protein-protein interfaces. , 2004, Journal of molecular biology.

[30]  Qian Liu,et al.  Propensity vectors of low‐ASA residue pairs in the distinction of protein interactions , 2010, Proteins.

[31]  Janet M. Thornton,et al.  Automatic inference of protein quaternary structure from crystals , 2003 .