CAALIGN: a program for pairwise and multiple protein-structure alignment.

Coordinate superposition of proteins provides a structural basis to protein similarity and therefore complements the technique of sequence alignment. Methods that carry out structure alignment are faced with the problem of the large number of trials necessary to determine the optimal alignment solution. This article presents a method of carrying out rapid (subsecond) protein-structure alignment between pairs of proteins based on a maximal C(alpha)-atom superposition. The algorithm can return alignments of 12 or more residues in length as multiple non-overlapping solutions of alignment between a pair of proteins which are independent of the fold connectivity and secondary-structure content. The algorithm is equally effective for all protein fold types and can align proteins containing no secondary-structure elements such as is the case when searching for common turn structures in proteins. It has high sensitivity and returns the set of true positive results before any false positives as judged by SCOP classification. It can find alignments between topologically different folds and returns information about sequence alignment based on structure alignment. Additionally, this algorithm has been extended to carry out multiple structure alignment to determine common structures within groups of proteins, including the nondegenerate set of proteins in the PDB. The algorithm has been implemented within the program CAALIGN and this article presents results from pairwise structure alignment, multiple structure alignment and the generation of common structure fragments found within the PDB using multiple structure alignment.

[1]  P Willett,et al.  Identification of tertiary structure resemblance in proteins using a maximal common subgraph isomorphism algorithm. , 1993, Journal of molecular biology.

[2]  Roderick E. Hubbard,et al.  Analysis of Cα geometry in protein structures , 1994 .

[3]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[4]  G. Kleywegt,et al.  Detecting folding motifs and similarities in protein structures. , 1997, Methods in enzymology.

[5]  J. Gustafsson,et al.  Refined solution structure of the glucocorticoid receptor DNA-binding domain. , 1994, Biochemistry.

[6]  Gerard J Kleywegt,et al.  Evaluation of protein fold comparison servers , 2003, Proteins.

[7]  A. Aggarwal,et al.  Structure of the multimodular endonuclease FokI bound to DNA , 1997, Nature.

[8]  D. Vassylyev,et al.  Crystal structure of vitelline membrane outer layer protein I (VMO‐I): a folding motif with homologous Greek key structures related by an internal three‐fold symmetry. , 1994 .

[9]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[10]  S. Phillips,et al.  Structure and refinement of oxymyoglobin at 1.6 A resolution. , 1980, Journal of molecular biology.

[11]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[12]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[13]  W R Taylor,et al.  SSAP: sequential structure alignment program for protein structure comparison. , 1996, Methods in enzymology.

[14]  T J Oldfield,et al.  Data mining the protein data bank: Residue interactions , 2002, Proteins.

[15]  Philip E. Bourne,et al.  A New Scoring Function and Associated Statistical Significance for Structure Alignment by CE , 2004, J. Comput. Biol..

[16]  John Kuriyan,et al.  Crystal structure of the eukaryotic DNA polymerase processivity factor PCNA , 1994, Cell.

[17]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[18]  P Willett,et al.  Use of techniques derived from graph theory to compare secondary structure motifs in proteins. , 1990, Journal of molecular biology.

[19]  T. Blundell,et al.  Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. , 1990, Journal of molecular biology.

[20]  Michael J. Hartshorn,et al.  AstexViewerTM †: a visualisation aid for structure-based drug design , 2002, J. Comput. Aided Mol. Des..

[21]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[22]  G. Phillips,et al.  Crystal structure of tropomyosin at 7 Ångstroms resolution , 2000, Proteins.

[23]  M. Caffrey,et al.  Model for the structure of the HIV gp41 ectodomain: insight into the intermolecular interactions of the gp41 loop. , 2001, Biochimica et biophysica acta.

[24]  M. Perutz,et al.  The crystal structure of human deoxyhaemoglobin at 1.74 A resolution. , 1984, Journal of molecular biology.

[25]  T J Oldfield,et al.  SQUID: a program for the analysis and display of data from crystallography and molecular dynamics. , 1992, Journal of molecular graphics.

[26]  C. Sander,et al.  Detection of common three‐dimensional substructures in proteins , 1991, Proteins.