On protein structure alignment under distance constraint

In this paper we study the protein structure comparison problem where each protein is modeled as a sequence of 3D points, and a contact edge is placed between every two of these points that are sufficiently close. Given two proteins represented this way, our problem is to find a subset of points from each protein, and a bijective matching of points between these two subsets, with the objective of maximizing either (A) the size of the subsets (the LCP problem), or (B) the number of edges that exist simultaneously in both subsets (the CMO problem), under the requirement that only points within a specified proximity can be matched. It is known that the general CMO problem (without the proximity requirement) is hard to approximate. However, with the proximity requirement, it is known that if a minimum inter-residue distance is imposed on the input, approximate solutions can be efficiently obtained. In this paper we mainly show that the CMO problem under these conditions: (1) is NP-hard, but (2) allows a PTAS. The rest of this paper shows algorithms for the LCP problem which improve on known results.

[1]  Mark Gerstein,et al.  Using Iterative Dynamic Programming to Obtain Accurate Pairwise and Multiple Alignments of Protein Structures , 1996, ISMB.

[2]  Thomas Lengauer,et al.  Proceedings of the Fifth Annual International Conference on Computational Biology, RECOMB 2001, Montréal, Québec, Canada, April 22-25, 2001 , 2001, Annual International Conference on Research in Computational Molecular Biology.

[3]  Matteo Comin,et al.  PROuST: A Comparison Method of Three-Dimensional Structures of Proteins Using Indexing Techniques , 2004, J. Comput. Biol..

[4]  Douglas L. Brutlag,et al.  Hierarchical Protein Structure Superposition Using Both Secondary Structure and Atomic Representations , 1997, ISMB.

[5]  Samarjit Chakraborty,et al.  Approximation Algorithms for 3-D Commom Substructure Identification in Drug and Protein Molecules , 1999, WADS.

[6]  Christian Lemmen,et al.  Computational methods for the structural alignment of molecules , 2000, J. Comput. Aided Mol. Des..

[7]  Martin E. Dyer,et al.  Planar 3DM is NP-Complete , 1986, J. Algorithms.

[8]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[9]  J. A. Bondy,et al.  Graph Theory with Applications , 1978 .

[10]  Alberto Caprara,et al.  Structural alignment of large—size proteins via lagrangian relaxation , 2002, RECOMB '02.

[11]  Bonnie Berger,et al.  A Parameterized Algorithm for Protein Structure Alignment , 2007, J. Comput. Biol..

[12]  Robert D. Carr,et al.  101 optimal PDB structure alignments: a branch-and-cut algorithm for the maximum contact map overlap problem , 2001, RECOMB.

[13]  Tatsuya Akutsu,et al.  Protein Structure Alignment Using Dynamic Programing and Iterative Improvement , 1996 .

[14]  Richard M. Karp,et al.  A n^5/2 Algorithm for Maximum Matchings in Bipartite Graphs , 1971, SWAT.

[15]  T Akutsu,et al.  Protein structure comparison using representation by line segment sequences. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[16]  Nathan Linial,et al.  Approximate protein structural alignment in polynomial time. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[17]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[18]  Christos H. Papadimitriou,et al.  Algorithmic aspects of protein structure similarity , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[19]  Giuseppe Lancia,et al.  Protein Structure Comparison: Algorithms and Applications , 2003, Mathematical Methods for Protein Structure Analysis and Design.

[20]  Shuai Cheng Li,et al.  Finding Largest Well-Predicted Subset of Protein Structure Models , 2008, CPM.

[21]  Matteo Comin,et al.  PROuST: a server based comparison method of three-dimensional structures of proteins using indexing techniques , 2004 .

[22]  N. Alexandrov,et al.  SARFing the PDB. , 1996, Protein engineering.

[23]  Samarjit Chakraborty,et al.  Computing Largest Common Point Sets under Approximate Congruence , 2000, ESA.