Fast Algorithms for Determining Protein Structure Similarity

The problem of identifying the common three-dimensional structure between two protein molecules has received considerable attention from both the biology community and also from algorithms researchers. A number of similarity measures have been proposed so far for this purpose. Among them are the RMS distance, those based on geometric hashing, and some based on the contact map overlap. Very recently, a new measure called the bottleneck matching metric has been used as a measure of similarity between two drug or protein molecules. Although experimental studies have indicated the robustness of this metric, all the algorithms developed so far which are based on this suffer from running times which are high-degree polynomials in the number of atoms in the protein molecules, making them infeasible for practical applications. In this paper we show that by exploiting a very simple structural property of the α-Carbon backbone structures of proteins, the running time of some of these algorithms can be considerably improved. This can be further combined with some fairly standard algorithmic techniques such as randomization, and/or an approximate matching scheme for bipartite graphs. The resulting algorithms have running times which are nearly linear in the number of atoms in the proteins being compared, making the bottleneck matching measure a viable candidate for practical applications.

[1]  Joseph S. B. Mitchell,et al.  Practical methods for approximate geometric pattern matching under rigid motions: (preliminary version) , 1994, SCG '94.

[2]  Tatsuya Akutsu,et al.  On the approximation of largest common subtrees and largest common point sets , 2000, Theor. Comput. Sci..

[3]  C. Branden,et al.  Introduction to protein structure , 1991 .

[4]  Richard M. Karp,et al.  A n^5/2 Algorithm for Maximum Matchings in Bipartite Graphs , 1971, SWAT.

[5]  Christos H. Papadimitriou,et al.  Algorithmic aspects of protein structure similarity , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[6]  Jon M. Kleinberg,et al.  Fast Detection of Common Geometric Substructure in Proteins , 1999, J. Comput. Biol..

[7]  G M Crippen,et al.  Significance of root-mean-square deviation in comparing three-dimensional structures of globular proteins. , 1994, Journal of molecular biology.

[8]  Lydia E. Kavraki,et al.  RAPID: randomized pharmacophore identification for drug design , 1997, SCG '97.

[9]  Tatsuya Akutsu,et al.  Distribution of Distances and Triangles in a Point Set and Algorithms for Computing the Largest Common Point Sets , 1997, SCG '97.

[10]  Tatsuya Akutsu,et al.  Protein Structure Alignment Using Dynamic Programing and Iterative Improvement , 1996 .

[11]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[12]  Alon Itai,et al.  Geometry Helps in Bottleneck Matching and Related Problems , 2001, Algorithmica.

[13]  Samarjit Chakraborty,et al.  Approximation Algorithms for 3-D Commom Substructure Identification in Drug and Protein Molecules , 1999, WADS.

[14]  Sandy Irani,et al.  Combinatorial and experimental results for randomized point matching algorithms , 1996, SCG '96.

[15]  Richard M. Karp,et al.  A n^5/2 Algorithm for Maximum Matchings in Bipartite Graphs , 1971, SWAT.

[16]  Samarjit Chakraborty,et al.  Computing Largest Common Point Sets under Approximate Congruence , 2000, ESA.

[17]  Kurt Mehlhorn,et al.  Congruence, similarity, and symmetries of geometric objects , 1987, SCG '87.

[18]  H. Wolfson,et al.  An efficient automated computer vision based technique for detection of three dimensional structural motifs in proteins. , 1992, Journal of biomolecular structure & dynamics.