Local‐scale repetitiveness in amino acid use in eukaryote protein sequences: A genomic factor in protein evolution

We showed previously that the use of arginine versus lysine residues in eukaryote proteins is correlated positively with local GC content of the genome within ≈50 residues. Cumulative analyses show that the tendency for self‐clustering (or repetitive use) generally is the case for all types of amino acids except for certain hydrophobic types. The degree to which each of the amino acids is used recurrently is weak for ancient proteins (or protein domains), those that are conserved through both eukaryotes and prokaryotes, but strong for modern proteins, which are unique to organisms of particular phyla. These findings support the idea that repetitiveness occurs due to a propensity of genomic DNA to cause tandem genomic duplication. A protein sequence with high repetitiveness tends to be unique in the homology search, which may indicate the weaker constraints and, hence, more arbitrary use of amino acids. Simulation analyses suggest that tandem gene duplications on a very small scale (1 or 2 codons) is an important causal factor in maintaining repetitiveness in the presence of concomittant occurrence of substitutive point mutation. For yeast proteins, ≈1.3 duplication events per 1,000 residues on average are likely to occur, whereas 10 events of substitution mutation occur. It also is suggested that duplication enhances the probability of occurrence of some peptide motifs, such as those found in zinc fingers and segments with extreme physicochemical characteristics, and, thus, that local repetitiveness is a genomic factor influencing the evolution of eukaryote proteins. Proteins 1999;37:284–292. ©1999 Wiley‐Liss, Inc.

[1]  M. Saraste,et al.  Structure and function of the SH3 domain. , 1994, Progress in biophysics and molecular biology.

[2]  László Patthy,et al.  Modular exchange principles in proteins , 1991 .

[3]  K. Kuma,et al.  Evolution of gene families and relationship with organismal evolution: rapid divergence of tissue-specific genes in the early evolution of chordates. , 1996, Molecular biology and evolution.

[4]  R. Doolittle Redundancies in Protein Sequences , 1989 .

[5]  P Argos,et al.  Oligopeptide biases in protein sequences and their use in predicting protein coding regions in nucleotide sequences , 1988, Proteins.

[6]  G. Fasman Prediction of Protein Structure and the Principles of Protein Conformation , 2012, Springer US.

[7]  S F Altschul,et al.  Statistical methods and insights for protein and DNA sequences. , 1991, Annual review of biophysics and biophysical chemistry.

[8]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[9]  R F Doolittle,et al.  Relationships of human protein sequences to those of other organisms. , 1986, Cold Spring Harbor symposia on quantitative biology.

[10]  D. Baker,et al.  Recurring local sequence motifs in proteins. , 1995, Journal of molecular biology.

[11]  J. Chang,et al.  cDNA and deduced primary structure of rat protein B23, a nucleolar protein containing highly conserved sequences. , 1988, The Journal of biological chemistry.

[12]  S Rackovsky,et al.  "Hidden" sequence periodicities and protein architecture. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[13]  D. Tautz,et al.  Cryptic simplicity in DNA is a major source of genetic variation , 1986, Nature.

[14]  P Bork,et al.  Evolutionarily mobile modules in proteins. , 1993, Scientific American.

[15]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[16]  Stuart A. Kauffman,et al.  ORIGINS OF ORDER , 2019, Origins of Order.

[17]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[18]  L. Liotta,et al.  Sulfatide-binding domain of the laminin A chain. , 1990, The Journal of biological chemistry.

[19]  S H White,et al.  Global statistics of protein sequences: implications for the origin, evolution, and prediction of structure. , 1994, Annual review of biophysics and biomolecular structure.

[20]  J. Celis,et al.  Reference points for comparisons of two‐dimensional maps of proteins from different human cell types defined in a pH scale where isoelectric points correlate with polypeptide compositions , 1994, Electrophoresis.

[21]  R. Laskey,et al.  Nuclear targeting sequences--a consensus? , 1991, Trends in biochemical sciences.

[22]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[23]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[24]  M. Nishizawa,et al.  Biased Usages of Arginines and Lysines in Proteins Are Correlated with Local-Scale Fluctuations of the G + C Content of DNA Sequences , 1998, Journal of Molecular Evolution.

[25]  M. O. Dayhoff A model of evolutionary change in protein , 1978 .

[26]  K. Kuma,et al.  Functional constraints against variations on molecules from the tissue level: slowly evolving brain-specific genes demonstrated by protein kinase and immunoglobulin supergene families. , 1995, Molecular biology and evolution.