Fast Detection of Common Geometric Substructure in Proteins

We consider the problem of identifying common three-dimensional substructures between proteins. Our method is based on comparing the shape of the alpha-carbon backbone structures of the proteins in order to find three-dimensional (3D) rigid motions that bring portions of the geometric structures into correspondence. We propose a geometric representation of protein backbone chains that is compact yet allows for similarity measures that are robust against noise and outliers. This representation encodes the structure of the backbone as a sequence of unit vectors, defined by each adjacent pair of alpha-carbons. We then define a measure of the similarity of two protein structures based on the root mean squared (RMS) distance between corresponding orientation vectors of the two proteins. Our measure has several advantages over measures that are commonly used for comparing protein shapes, such as the minimum RMS distance between the 3D positions of corresponding atoms in two proteins. A key advantage is that this new measure behaves well for identifying common substructures, in contrast with position-based measures where the nonmatching portions of the structure dominate the measure. At the same time, it avoids the quadratic space and computational difficulties associated with methods based on distance matrices and contact maps. We show applications of our approach to detecting common contiguous substructures in pairs of proteins, as well as the more difficult problem of identifying common protein domains (i.e., larger substructures that are not necessarily contiguous along the protein chain).

[1]  S. Rackovsky,et al.  Differential Geometry and Polymer Conformation. 1. Comparison of Protein Conformations1a,b , 1978 .

[2]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[3]  D. Eisenberg Proteins. Structures and molecular properties, T.E. Creighton. W. H. Freeman and Company, New York (1984), 515, $36.95 , 1985 .

[4]  K Nishikawa,et al.  Comparison of homologous tertiary structures of proteins. , 1974, Journal of theoretical biology.

[5]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[6]  M G Rossmann,et al.  Comparison of super-secondary structures in proteins. , 1973, Journal of molecular biology.

[7]  A. M. Lesk,et al.  A toolkit for computational molecular biology. II. On the optimal superposition of two sets of coordinates , 1986 .

[8]  S. Rackovsky,et al.  Differential Geometry and Polymer Conformation. 2. Development of a Conformational Distance Function , 1980 .

[9]  A. M. Lesk A toolkit for computational molecular biology. III. MICRYFON– a (fairly) general program for input of protein coordinate files , 1987 .

[10]  C. Sander,et al.  Searching protein structure databases has come of age , 1994, Proteins.

[11]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[12]  R Balasubramanian,et al.  Some new methods and general results of analysis of protein crystallographic structural data. , 1975, Journal of molecular biology.

[13]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[14]  Nicholas Ayache,et al.  A geometric algorithm to find small but highly similar 3D substructures in proteins , 1998, Bioinform..

[15]  Ruth Nussinov,et al.  3-D Substructure Matching in Protein Molecules , 1992, CPM.

[16]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[17]  T. Blundell,et al.  Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. , 1990, Journal of molecular biology.

[18]  William R. Taylor,et al.  A Protein Structure Comparison Methodology , 1996, Comput. Chem..

[19]  P Willett,et al.  Identification of tertiary structure resemblance in proteins using a maximal common subgraph isomorphism algorithm. , 1993, Journal of molecular biology.

[20]  C. Sander,et al.  Detection of common three‐dimensional substructures in proteins , 1991, Proteins.

[21]  S Rackovsky,et al.  Protein comparison and classification: a differential geometric approach. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[22]  H A Scheraga,et al.  Low-energy structures of two dipeptides and their relationship to bend conformations. , 1974, Macromolecules.

[23]  D. Yee,et al.  DILL Families and the structural relatedness among globular proteins data , 1993 .

[24]  M. Levitt A simplified representation of protein conformations for rapid simulation of protein folding. , 1976, Journal of molecular biology.

[25]  W R Taylor,et al.  Fast structure alignment for protein databank searching , 1992, Proteins.