Clustering biomolecular complexes by residue contacts similarity

Inaccuracies in computational molecular modeling methods are often counterweighed by brute‐force generation of a plethora of putative solutions. These are then typically sieved via structural clustering based on similarity measures such as the root mean square deviation (RMSD) of atomic positions. Albeit widely used, these measures suffer from several theoretical and technical limitations (e.g., choice of regions for fitting) that impair their application in multicomponent systems (N > 2), large‐scale studies (e.g., interactomes), and other time‐critical scenarios. We present here a simple similarity measure for structural clustering based on atomic contacts—the fraction of common contacts—and compare it with the most used similarity measure of the protein docking community—interface backbone RMSD. We show that this method produces very compact clusters in remarkably short time when applied to a collection of binary and multicomponent protein–protein and protein–DNA complexes. Furthermore, it allows easy clustering of similar conformations of multicomponent symmetrical assemblies in which chain permutations can occur. Simple contact‐based metrics should be applicable to other structural biology clustering problems, in particular for time‐critical or large‐scale endeavors.Proteins 2012; © 2012 Wiley Periodicals, Inc.

[1]  Torsten Schwede,et al.  Assessment of template based protein structure predictions in CASP9 , 2011, Proteins.

[2]  D. Baker,et al.  Clustering of low-energy conformations near the native structures of small proteins. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Dima Kozakov,et al.  Optimal clustering for detecting near-native conformations in protein docking. , 2005, Biophysical journal.

[4]  G M Clore,et al.  Solution structure of the phosphoryl transfer complex between the signal transducing proteins HPr and IIAGlucose of the Escherichia coli phosphoenolpyruvate:sugar phosphotransferase system , 2000, The EMBO journal.

[5]  Alessandro Laio,et al.  Which similarity measure is better for analyzing protein structures in a molecular dynamics trajectory? , 2011, Physical chemistry chemical physics : PCCP.

[6]  Alexandre M J J Bonvin,et al.  Strengths and weaknesses of data‐driven docking in critical assessment of prediction of interactions , 2010, Proteins.

[7]  S. Harrison,et al.  The envelope glycoprotein from tick-borne encephalitis virus at 2 Å resolution , 1995, Nature.

[8]  Joël Janin,et al.  Protein-protein docking tested in blind predictions: the CAPRI experiment. , 2010, Molecular bioSystems.

[9]  P. Kastritis,et al.  Next challenges in protein–protein docking: from proteome to interactome and beyond , 2012 .

[10]  J. Skolnick,et al.  What is the probability of a chance prediction of a protein structure with an rmsd of 6 A? , 1998, Folding & design.

[11]  Sandor Vajda,et al.  CAPRI: A Critical Assessment of PRedicted Interactions , 2003, Proteins.

[12]  A. Fersht,et al.  Protein-protein recognition: crystal structural analysis of a barnase-barstar complex at 2.0-A resolution. , 1994, Biochemistry.

[13]  L. Wyns,et al.  Structural basis of carbohydrate recognition by the lectin LecB from Pseudomonas aeruginosa. , 2003, Journal of molecular biology.

[14]  S. Harrison,et al.  High‐resolution structure of a polyomavirus VP1‐oligosaccharide complex: implications for assembly and receptor binding , 1997, The EMBO journal.

[15]  X. Daura,et al.  Peptide Folding: When Simulation Meets Experiment , 1999 .

[16]  U. Bastolla,et al.  Testing similarity measures with continuous and discrete protein models , 2002, Proteins.

[17]  J. Horton,et al.  PvuII endonuclease contains two calcium ions in active sites. , 2000, Journal of molecular biology.

[18]  Dirk Van den Poel,et al.  Faculteit Economie En Bedrijfskunde Hoveniersberg 24 B-9000 Gent Incorporating Sequential Information into Traditional Classification Models by Using an Element/position-sensitive Sam , 2022 .

[19]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—round IX , 2011, Proteins.

[20]  B. Chait,et al.  Determining the architectures of macromolecular assemblies , 2007, Nature.

[21]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction—Round VII , 2007, Proteins.

[22]  K. Lindorff-Larsen,et al.  How robust are protein folding simulations with respect to force field parameterization? , 2011, Biophysical journal.