Significance of root-mean-square deviation in comparing three-dimensional structures of globular proteins.

In the study of globular protein conformations, one customarily measures the similarity in three-dimensional structure by the root-mean-square deviation (RMSD) of the C alpha atomic coordinates after optimal rigid body superposition. Even when the two protein structures each consist of a single chain having the same number of residues so that the matching of C alpha atoms is obvious, it is not clear how to interpret the RMSD. A very large value means they are dissimilar, and zero means they are identical in conformation, but at what intermediate values are they particularly similar or clearly dissimilar? While many workers in the field have chosen arbitrary cutoffs, and others have judged values of RMSD according to the observed distribution of RMSD for random structures, we propose a self-referential, non-statistical standard. We take two conformers to be intrinsically similar if their RMSD is smaller than that when one of them is mirror inverted. Because the structures considered here are not arbitrary configurations of point atoms, but are compact, globular, polypeptide chains, our definition is closely related to similarity in radius of gyration and overall chain folding patterns. Being strongly similar in our sense implies that the radii of gyration must be nearly identical, the root-mean-square deviation in interatomic distances is linearly related to RMSD, and the two chains must have the same general fold. Only when the RMSD exceeds this level can parts of the polypeptide chain undergo nontrivial rearrangements while remaining globular. This enables us to judge when a prediction of a protein's conformation is "correct except for minor perturbations", or when the ensemble of protein structures deduced from NMR experiments are "basically in mutual agreement".