Using Iterative Dynamic Programming to Obtain Accurate Pairwise and Multiple Alignments of Protein Structures

We show how a basic pairwise alignment procedure can be improved to more accurately align conserved structural regions, by using variable, position-dependent gap penalties that depend on secondary structure and by taking the consensus of a number of suboptimal alignments. These improvements, which are novel for structural alignment, are direct analogs of what is possible with normal sequences alignment. They are feasible for us since our basic structural alignment procedure, unlike others, is so similar to normal sequence alignment. We further present preliminary results that show how our procedure can be generalized to produce a multiple alignment of a family of structures. Our approach is based on finding a "median" structure from doing all possible pairwise alignments and then aligning everything to it.

[1]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[2]  C. Chothia,et al.  Volume changes in protein evolution. , 1994, Journal of molecular biology.

[3]  M Levitt,et al.  Different protein sequences can give rise to highly similar folds through different stabilizing interactions , 1994, Protein science : a publication of the Protein Society.

[4]  M. Gribskov,et al.  Sequence Analysis Primer , 1991 .

[5]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[6]  M S Waterman,et al.  Sequence alignment and penalty choice. Review of concepts, case studies and implications. , 1994, Journal of molecular biology.

[7]  M Levitt,et al.  Alignment of the amino acid sequences of distantly related proteins using variable gap penalties. , 1986, Protein engineering.

[8]  P. Argos,et al.  A data bank merging related protein structures and sequences. , 1992, Protein engineering.

[9]  Smith Rf,et al.  Pattern-induced multi-sequence alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for use in comparative protein modelling. , 1992 .

[10]  M. Levitt,et al.  Structural similarity of DNA-binding domains of bacteriophage repressors and the globin core , 1993, Current Biology.

[11]  C. Orengo Classification of protein folds , 1994 .

[12]  T. P. Flores,et al.  Multiple protein structure alignment , 1994, Protein science : a publication of the Protein Society.

[13]  John P. Overington,et al.  Alignment and searching for common protein folds using a data bank of structural templates. , 1993, Journal of molecular biology.

[14]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[15]  A. Lesk,et al.  Determinants of a protein fold. Unique features of the globin amino acid sequences. , 1987, Journal of molecular biology.

[16]  Mark Gerstein,et al.  Finding an Average Core Structure: Application to the Globins , 1994, ISMB.

[17]  W R Taylor,et al.  Hierarchical method to align large numbers of biological sequences. , 1990, Methods in enzymology.

[18]  C Sander,et al.  Structural alignment of globins, phycocyanins and colicin A , 1993, FEBS letters.

[19]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[20]  William R. Taylor,et al.  Multiple sequence alignment by a pairwise algorithm , 1987, Comput. Appl. Biosci..

[21]  E. Lander,et al.  Parametric sequence comparisons. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[22]  M. Gerstein,et al.  Average core structures and variability measures for protein families: application to the immunoglobulins. , 1995, Journal of molecular biology.

[23]  T. P. Flores,et al.  Identification and classification of protein fold families. , 1993, Protein engineering.

[24]  Peter Willett,et al.  Searching techniques for databases of protein secondary structures , 1989, J. Inf. Sci..

[25]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[26]  T. Blundell,et al.  Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. , 1990, Journal of molecular biology.

[27]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[28]  O. Kapp,et al.  Alignment of 700 globin sequences: Extent of amino acid substitution and its correlation with variation in volume , 1995, Protein science : a publication of the Protein Society.

[29]  Adam Godzik,et al.  Flexible algorithm for direct multiple alignment of protein structures and sequences , 1994, Comput. Appl. Biosci..

[30]  R. F. Smith,et al.  Pattern-induced multi-sequence alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for use in comparative protein modelling. , 1992, Protein engineering.

[31]  M. Zuker Suboptimal sequence alignment in molecular biology. Alignment with error analysis. , 1991, Journal of molecular biology.

[32]  C. Sander,et al.  The FSSP database of structurally aligned protein fold families. , 1994, Nucleic acids research.