A method to recognize distant repeats in protein sequences

An automated algorithm is presented that delineates protein sequence fragments which display similarity. The method incorporates a selection of a number of local nonoverlapping sequence alignments with the highest similarity scores and a graphtheoretical approach to elucidate the consistent start and end points of the fragments comprising one or more ensembles of related subsequences. The procedure allows the simultaneous identification of different types of repeats within one sequence. A multiple alignment of the resulting fragments is performed and a consensus sequence derived from the ensemble(s). Finally, a profile is constructed form the multiple alignment to detect possible and more distant members within the sequence. The method tolerates mutations in the repeats as well as insertions and deletions. The sequence spans between the various repeats or repeat clusters may be of different lengths. The technique has been applied to a number of proteins where the repeating fragments have been derived from information additional to the protein sequences. © 1993 Wiley‐Liss, Inc.

[1]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[2]  A. Mclachlan Tests for comparing related amino-acid sequences. Cytochrome c and cytochrome c 551 . , 1971, Journal of molecular biology.

[3]  U. Muller-eberhard,et al.  Hemopexin, the heme-binding serum β-glycoprotein , 1975, La Ricerca in Clinica e in Laboratorio.

[4]  A. Gotto,et al.  A molecular theory of lipid—protein interactions in the plasma lipoproteins , 1974, FEBS letters.

[5]  A. Mclachlan,et al.  The 14-fold periodicity in α-tropomyosin and the interaction with actin , 1976 .

[6]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[7]  A. Mclachlan,et al.  Analysis of periodic patterns in amino acid sequences: Collagen , 1977, Biopolymers.

[8]  A. Mclachlan Gene duplications in the structural evolution of chymotrypsin. , 1979, Journal of molecular biology.

[9]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[10]  Graeme Wistow,et al.  The molecular structure and stability of the eye lens: X-ray analysis of γ-crystallin II , 1981, Nature.

[11]  J. Maizel,et al.  Enhanced graphic matrix analysis of nucleic acid and protein sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[12]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[13]  A. Mclachlan,et al.  Analysis of gene duplication repeats in the myosin rod. , 1983, Journal of molecular biology.

[14]  Graeme Wistow,et al.  X-ray analysis of the eye lens protein γ-II crystallin at 1·9 Å resolution , 1983 .

[15]  M. O. Dayhoff,et al.  Establishing homologies in protein sequences. , 1983, Methods in enzymology.

[16]  J. Devereux,et al.  A comprehensive set of sequence analysis programs for the VAX , 1984, Nucleic Acids Res..

[17]  A. D. McLachlan,et al.  Sequence comparison by exponentially-damped alignment , 1984, Nucleic Acids Res..

[18]  A Klug,et al.  Repetitive zinc‐binding domains in the protein transcription factor IIIA from Xenopus oocytes. , 1985, The EMBO journal.

[19]  P. Argos,et al.  The primary structure of human hemopexin deduced from cDNA sequence: evidence for internal, repeating homology. , 1985, Nucleic acids research.

[20]  R. Hynes,et al.  Repeating modular structure of the fibronectin gene: relationship to protein structure and subunit variation. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[21]  P Argos,et al.  The primary structure of transcription factor TFIIIA has 12 consecutive repeats , 1985, FEBS letters.

[22]  M Gribskov,et al.  Sigma factors from E. coli, B. subtilis, phage SP01, and phage T4 are homologous proteins. , 1986, Nucleic acids research.

[23]  C. Luo,et al.  Structure and evolution of the apolipoprotein multigene family. , 1986, Journal of molecular biology.

[24]  M. Boguski,et al.  Evolution of the apolipoproteins. Structure of the rat apo-A-IV gene and its relationship to the human genes for apo-A-I, C-III, and E. , 1986, The Journal of biological chemistry.

[25]  P. Argos,et al.  Fingers and helices , 1986, Nature.

[26]  P Argos,et al.  A sensitive procedure to compare amino acid sequences. , 1987, Journal of molecular biology.

[27]  M. Waterman,et al.  A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. , 1987, Journal of molecular biology.

[28]  C. DeLisi,et al.  Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins. , 1987, Journal of molecular biology.

[29]  R. Hynes,et al.  Organization of the fibronectin gene provides evidence for exon shuffling during evolution. , 1987, The EMBO journal.

[30]  P Argos,et al.  Repeating structure of chick tropoelastin revealed by complementary DNA cloning. , 1987, Biochemistry.

[31]  M. Sternberg,et al.  A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. , 1987, Journal of molecular biology.

[32]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Webb Miller,et al.  A space-efficient algorithm for local similarities , 1990, Comput. Appl. Biosci..

[34]  D A Agard,et al.  Three-dimensional structure of the LDL receptor-binding domain of human apolipoprotein E. , 1991, Science.

[35]  P Argos,et al.  Protein sequence comparison: methods and significance. , 1991, Protein engineering.

[36]  I. Rayment,et al.  Molecular structure of an apolipoprotein determined at 2.5-A resolution. , 1991, Biochemistry.

[37]  P. Argos,et al.  Side-chain clusters in protein structures and their role in protein folding. , 1991, Journal of molecular biology.

[38]  U. Kulkarni-Kale,et al.  Sequence alignment approach to pick up conformationally similar protein fragments. , 1992, Journal of molecular biology.