Optimal Pairwise Alignment of Fixed protein Structures in Subquadratic Time

The problem of finding an optimal structural alignment for a pair of superimposed proteins is often amenable to the Smith-Waterman dynamic programming algorithm, which runs in time proportional to the product of lengths of the sequences being aligned. While the quadratic running time is acceptable for computing a single alignment of two fixed protein structures, the time complexity becomes a bottleneck when running the Smith-Waterman routine multiple times in order to find a globally optimal superposition and alignment of the input proteins. We present a subquadratic running time algorithm capable of computing an alignment that optimizes one of the most widely used measures of protein structure similarity, defined as the number of pairs of residues in two proteins that can be superimposed under a predefined distance cutoff. The algorithm presented in this article can be used to significantly improve the speed-accuracy tradeoff in a number of popular protein structure alignment methods.

[1]  Hongyi Zhou,et al.  Fold recognition by combining sequence profiles derived from evolution and from depth‐dependent structural alignment of fragments , 2004, Proteins.

[2]  Tatsuya Akutsu,et al.  Protein Structure Alignment Using Dynamic Programing and Iterative Improvement , 1996 .

[3]  Andrej ⩽ali,et al.  Comparative protein modeling by satisfaction of spatial restraints , 1995 .

[4]  Daniel S. Hirschberg,et al.  Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[5]  C. Sander,et al.  Detection of common three‐dimensional substructures in proteins , 1991, Proteins.

[6]  Nicola D. Gold,et al.  SitesBase: a database for structure-based protein–ligand binding site comparisons , 2005, Nucleic Acids Res..

[7]  David Baker,et al.  Protein structure prediction and analysis using the Robetta server , 2004, Nucleic Acids Res..

[8]  Amar Mukhopadhyay A fast algorithm for the longest-common-subsequence problem , 1980, Inf. Sci..

[9]  Aleksandar Poleksic,et al.  Protein Structure Alignment in Subquadratic Time , 2010, BIONETICS.

[10]  M. Levitt,et al.  A unified statistical framework for sequence comparison and structure comparison. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Shuai Cheng Li,et al.  Finding Largest Well-Predicted Subset of Protein Structure Models , 2008, CPM.

[12]  Aleksandar Poleksic,et al.  Algorithms for optimal protein structure alignment , 2009, Bioinform..

[13]  Cathy H. Wu,et al.  Protein family classification and functional annotation , 2003, Comput. Biol. Chem..

[14]  Bonnie Berger,et al.  A Parameterized Algorithm for Protein Structure Alignment , 2007, J. Comput. Biol..

[15]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction—Round VII , 2007, Proteins.

[16]  Aleksandar Poleksic,et al.  STRUCTFAST: Protein sequence remote homology detection and alignment using novel dynamic programming and profile–profile scoring , 2006, Proteins.

[17]  Arne Elofsson,et al.  MaxSub: an automated measure for the assessment of protein structure prediction quality , 2000, Bioinform..

[18]  N. Grishin,et al.  Practical lessons from protein structure prediction , 2005, Nucleic acids research.

[19]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[20]  Nathan Linial,et al.  Approximate protein structural alignment in polynomial time. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[21]  T J Oldfield CAALIGN: a program for pairwise and multiple protein-structure alignment. , 2007, Acta crystallographica. Section D, Biological crystallography.

[22]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[23]  Douglas L. Brutlag,et al.  Hierarchical Protein Structure Superposition Using Both Secondary Structure and Atomic Representations , 1997, ISMB.

[24]  W R Taylor,et al.  SSAP: sequential structure alignment program for protein structure comparison. , 1996, Methods in enzymology.

[25]  Ron Elber,et al.  Enriching the sequence substitution matrix by structural information , 2003, Proteins.

[26]  Leszek Rychlewski,et al.  LiveBench‐8: The large‐scale, continuous assessment of automated protein structure prediction , 2005, Protein science : a publication of the Protein Society.

[27]  Osvaldo Olmea,et al.  MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison , 2002, Protein science : a publication of the Protein Society.

[28]  Aleksandar Poleksic,et al.  A Different Look at the Quality of Modeled Three-Dimensional protein Structures , 2008, J. Bioinform. Comput. Biol..

[29]  Roland L. Dunbrack,et al.  CAFASP3: The third critical assessment of fully automated structure prediction methods , 2003, Proteins.

[30]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[31]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[32]  Henryk Iwaniec,et al.  On the Sphere Problem , 1995 .

[33]  Mike Paterson,et al.  A Faster Algorithm Computing String Edit Distances , 1980, J. Comput. Syst. Sci..

[34]  Samarjit Chakraborty,et al.  Computing Largest Common Point Sets under Approximate Congruence , 2000, ESA.

[35]  Robert D. Carr,et al.  1001 Optimal PDB Structure Alignments: Integer Programming Methods for Finding the Maximum Contact Map Overlap , 2004, J. Comput. Biol..

[36]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[37]  Lei Xie,et al.  Detecting evolutionary relationships across existing fold space, using sequence order-independent profile–profile alignments , 2008, Proceedings of the National Academy of Sciences.

[38]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[39]  N. Go,et al.  Common spatial arrangements of backbone fragments in homologous and non-homologous proteins. , 1992, Journal of molecular biology.

[40]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[41]  Mark Gerstein,et al.  Using Iterative Dynamic Programming to Obtain Accurate Pairwise and Multiple Alignments of Protein Structures , 1996, ISMB.

[42]  Christos H. Papadimitriou,et al.  Algorithmic aspects of protein structure similarity , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[43]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[44]  C. Sander,et al.  A database of protein structure families with common folding motifs , 1992, Protein science : a publication of the Protein Society.

[45]  Thomas G. Szymanski,et al.  A fast algorithm for computing longest common subsequences , 1977, CACM.

[46]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.