Comprehensive inventory of protein complexes in the Protein Data Bank from consistent classification of interfaces

BackgroundProtein-protein interactions are ubiquitous and essential for all cellular processes. High-resolution X-ray crystallographic structures of protein complexes can reveal the details of their function and provide a basis for many computational and experimental approaches. Differentiation between biological and non-biological contacts and reconstruction of the intact complex is a challenging computational problem. A successful solution can provide additional insights into the fundamental principles of biological recognition and reduce errors in many algorithms and databases utilizing interaction information extracted from the Protein Data Bank (PDB).ResultsWe have developed a method for identifying protein complexes in the PDB X-ray structures by a four step procedure: (1) comprehensively collecting all protein-protein interfaces; (2) clustering similar protein-protein interfaces together; (3) estimating the probability that each cluster is relevant based on a diverse set of properties; and (4) combining these scores for each PDB entry in order to predict the complex structure. The resulting clusters of biologically relevant interfaces provide a reliable catalog of evolutionary conserved protein-protein interactions. These interfaces, as well as the predicted protein complexes, are available from the Protein Interface Server (PInS) website (see Availability and requirements section).ConclusionOur method demonstrates an almost two-fold reduction of the annotation error rate as evaluated on a large benchmark set of complexes validated from the literature. We also estimate relative contributions of each interface property to the accurate discrimination of biologically relevant interfaces and discuss possible directions for further improving the prediction method.

[1]  E. Lehmann,et al.  Nonparametrics: Statistical Methods Based on Ranks , 1976 .

[2]  H. Wolfson,et al.  A new, structurally nonredundant, diverse data set of protein–protein interfaces and its implications , 2004, Protein science : a publication of the Protein Society.

[3]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[4]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[5]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[6]  Robert B. Russell,et al.  3did: interacting protein domains of known three-dimensional structure , 2004, Nucleic Acids Res..

[7]  K. Henrick,et al.  Inference of macromolecular assemblies from crystalline state. , 2007, Journal of molecular biology.

[8]  J. Thornton,et al.  PQS: a protein quaternary structure file server. , 1998, Trends in biochemical sciences.

[9]  Sarah A. Teichmann,et al.  3D Complex: A Structural Classification of Protein Complexes , 2006, PLoS Comput. Biol..

[10]  Michael Schroeder,et al.  SCOPPI: a structural classification of protein–protein interfaces , 2005, Nucleic Acids Res..

[11]  R. Nussinov,et al.  Hydrogen bonds and salt bridges across protein-protein interfaces. , 1997, Protein engineering.

[12]  Sarah A. Teichmann,et al.  Principles of protein-protein interactions , 2002, ECCB.

[13]  Ruben Abagyan,et al.  Statistical analysis and prediction of protein–protein interfaces , 2005, Proteins.

[14]  J. Thornton,et al.  Discriminating between homodimeric and monomeric proteins in the crystalline state , 2000, Proteins.

[15]  Joël Janin,et al.  Specific versus non-specific contacts in protein crystals , 1997, Nature Structural Biology.

[16]  Alan R. Fersht,et al.  Basis of biological specificity , 1984 .

[17]  Patrick Aloy,et al.  Interrogating protein interaction networks through structural biology , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[18]  J M Thornton,et al.  Protein-protein interactions: a review of protein dimer structures. , 1995, Progress in biophysics and molecular biology.

[19]  J. Thornton,et al.  Structural characterisation and functional significance of transient protein-protein interactions. , 2003, Journal of molecular biology.

[20]  Geoffrey J. Barton,et al.  SNAPPI-DB: a database and API of Structures, iNterfaces and Alignments for Protein–Protein Interactions , 2007, Nucleic Acids Res..

[21]  Emil Alexov,et al.  Nucleic Acids Research Advance Access published October 28, 2006 PROTCOM: searchable database of protein complexes enhanced with domain–domain structures , 2006 .

[22]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[23]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[24]  R. Norel,et al.  Electrostatic aspects of protein-protein interactions. , 2000, Current opinion in structural biology.

[25]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[26]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[27]  J. Janin,et al.  A dissection of specific and non-specific protein-protein interfaces. , 2004, Journal of molecular biology.

[28]  Janet M. Thornton,et al.  Automatic inference of protein quaternary structure from crystals , 2003 .

[29]  Fred P. Davis,et al.  PIBASE: a comprehensive database of structurally defined protein interfaces , 2005, Bioinform..

[30]  J M Thornton,et al.  Conservation helps to identify biologically relevant crystal contacts. , 2001, Journal of molecular biology.

[31]  A. Gorin,et al.  Protein docking using surface matching and supervised machine learning , 2007, Proteins.

[32]  Ozlem Keskin,et al.  PRISM: protein interactions by structural matching , 2005, Nucleic Acids Res..

[33]  Hui Lu,et al.  MULTIPROSPECTOR: An algorithm for the prediction of protein–protein interactions by multimeric threading , 2002, Proteins.

[34]  Robert D. Finn,et al.  iPfam: visualization of protein?Cprotein interactions in PDB at domain and amino acid resolutions , 2005, Bioinform..