Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation

An algorithm is described for the systematic characterization of the physico-chemical properties seen at each position in a multiple protein sequence alignment. The new algorithm allows questions important in the design of mutagenesis experiments to be quickly answered since positions in the alignment that show unusual or interesting residue substitution patterns may be rapidly identified. The strategy is based on a flexible set-based description of amino acid properties, which is used to define the conservation between any group of amino acids. Sequences in the alignment are gathered into subgroups on the basis of sequence similarity, functional, evolutionary or other criteria. All pairs of subgroups are then compared to highlight positions that confer the unique features of each subgroup. The algorithm is encoded in the computer program AMAS (Analysis of Multiply Aligned Sequences) which provides a textual summary of the analysis and an annotated (boxed, shaded and/or coloured) multiple sequence alignment. The algorithm is illustrated by application to an alignment of 67 SH2 domains where patterns of conserved hydrophobic residues that constitute the protein core are highlighted. The analysis of charge conservation across annexin domains identifies the locations at which conserved charges change sign. The algorithm simplifies the analysis of multiple sequence data by condensing the mass of information present, and thus allows the rapid identification of substitutions of structural and functional importance.

[1]  J. Devereux,et al.  A comprehensive set of sequence analysis programs for the VAX , 1984, Nucleic Acids Res..

[2]  L. M. Hobbs AUTOMATIC GENERATION OF , 1987 .

[3]  M. Sternberg,et al.  Prediction of protein secondary structure and active sites using the alignment of homologous sequences. , 1987, Journal of molecular biology.

[4]  I. Crawford,et al.  Prediction of secondary structure by evolutionary comparison: Application to the α subunit of tryptophan synthase , 1987, Proteins.

[5]  Desmond G. Higgins,et al.  Fast and sensitive multiple sequence alignments on a microcomputer , 1989, Comput. Appl. Biosci..

[6]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[7]  G. Barton,et al.  Amino acid sequence analysis of the annexin super-gene family of proteins. , 1991, European journal of biochemistry.

[8]  S. Benner,et al.  Patterns of divergence in homologous proteins as indicators of secondary and tertiary structure: a prediction of the structure of the catalytic domain of protein kinases. , 1991, Advances in enzyme regulation.

[9]  Terri K. Attwood,et al.  SOMAP: a novel interactive approach to multiple protein sequences alignment , 1991, Comput. Appl. Biosci..

[10]  D. Baltimore,et al.  Three-dimensional solution structure of the src homology 2 domain of c-abl , 1992, Cell.

[11]  I. Campbell,et al.  Structure of an SH2 domain of the p85 alpha subunit of phosphatidylinositol-3-OH kinase. , 1992, Nature.

[12]  D. Baltimore,et al.  Crystal structure of the phosphotyrosine recognition domain SH2 of v-src complexed with tyrosine-phosphorylated peptides , 1993, Nature.

[13]  P. Slonimski,et al.  Evolutionary divergence plots of homologous proteins. , 1992, Biochimie.

[14]  G J Barton,et al.  ALSCRIPT: a tool to format multiple sequence alignments. , 1993, Protein engineering.