Size‐independent comparison of protein three‐dimensional structures

Protein structures are routinely compared by their root‐mean‐square deviation (RMSD) in atomic coordinates after optimal rigid body superposition. What is not so clear is the significance of different RMSD values, particularly above the customary arbitrary cutoff for obvious similarity of 2–3 Å. Our earlier work argued for an intrinsic cutoff for protein similarity that varied with the number of residues in the polypeptide chains being compared. Here we introduce a new measure, ρ, of structural similarity based on RMSD that is independent of the sizes of the molecules involved, or of any other special properties of molecules. When ρ is less than 0.4–0.5, protein structures are visually recognized to be obviously similar, but the mathematically pleasing intrinsic cutoff of ρ>1.0 corresponds to overall similarity in folding motif at a level not usually recognized until smoothing of the polypeptide chain path makes it striking. When the structures are scaled to unit radius of gyration and equal principle moments of inertia, the comparisons are even more universal, since they are no longer obscured by differences in overall size and ellipticity. With increasing chain length, the distribution of ρ for pairs of random structures is skewed to higher values, but the value for the best 1% of the comparisons rises only slowly with the number of residues. This level is close to an intrinsic cutoff between similar and dissimilar comparisons, namely the maximal scaled ρ possible for the two structures to be more similar to each other than one is to the other's mirror image. The intrinsic cutoff is independent of the number of residues or points being compared. For proteins having fewer than 100 residues, the 1% ρ falls below the intrinsic cutoff, so that for very small proteins, geometrically significant similarity can often occur by chance. We believe these ideas will be helpful in judging success in NMR structure determination and protein folding modeling. © 1995 Wiley‐Liss, Inc.

[1]  G. Crippen,et al.  Contact potential that recognizes the correct folding of globular proteins. , 1992, Journal of molecular biology.

[2]  S J Remington,et al.  A general method to assess similarity of protein structures, with applications to T4 bacteriophage lysozyme. , 1978, Proceedings of the National Academy of Sciences of the United States of America.

[3]  P. Kraulis A program to produce both detailed and schematic plots of protein structures , 1991 .

[4]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[5]  R J Williams,et al.  Topological mirror images in protein structure computation: An underestimated problem , 1991, Proteins.

[6]  M. Sternberg,et al.  On the prediction of protein structure: The significance of the root-mean-square deviation. , 1980, Journal of molecular biology.

[7]  R. Levy,et al.  Global folding of proteins using a limited number of distance constraints. , 1993, Protein engineering.

[8]  Arthur M. Lesk,et al.  A toolkit for computational molecular biology I: packing and unpacking of protein coordinate sets , 1983 .

[9]  N. Go,et al.  Common spatial arrangements of backbone fragments in homologous and non-homologous proteins. , 1992, Journal of molecular biology.

[10]  A. Mclachlan Gene duplications in the structural evolution of chymotrypsin. , 1979, Journal of molecular biology.

[11]  G M Crippen,et al.  Significance of root-mean-square deviation in comparing three-dimensional structures of globular proteins. , 1994, Journal of molecular biology.

[12]  A. M. Lesk,et al.  A toolkit for computational molecular biology. II. On the optimal superposition of two sets of coordinates , 1986 .

[13]  R. L. Somorjai,et al.  The alignment of protein structures in three dimensions , 1989 .

[14]  A. Mclachlan,et al.  How alike are the shapes of two random chains? , 1984, Biopolymers.

[15]  S J Remington,et al.  A systematic approach to the comparison of protein structures. , 1980, Journal of molecular biology.

[16]  A. D. McLachlan,et al.  Rapid comparison of protein structures , 1982 .

[17]  Gene H. Golub,et al.  Matrix computations , 1983 .

[18]  N N Alexandrov,et al.  Biological meaning, statistical significance, and classification of local spatial similarities in nonhomologous proteins , 1994, Protein science : a publication of the Protein Society.

[19]  Gordon M. Crippen,et al.  Distance Geometry and Molecular Conformation , 1988 .