Amino Acid Reiterations in Yeast Are Overrepresented in Particular Classes of Proteins and Show Evidence of a Slippage-Like Mutational Process

Abstract. Long amino acid repeats are often observed in eukaryotic proteins. In humans, several neurological disorders are caused by proteins containing abnormally long polyglutamines. However, no systematic analysis has attempted to investigate the relationship between reiterations of particular amino acids and protein function, the possible mechanisms involved in the generation of these regions, or the contribution of selection in restricting their genomic distribution, in a large collection of wild-type proteins. We have used baker's yeast open reading frames to study these questions. The most abundant amino acid repeats found in yeast proteins are repeats of glutamine, asparagine, aspartic acid, glutamic acid, and serine. Different amino acid repeats are concentrated in different classes of proteins. Acidic and polar amino acid repeats are significantly associated with transcription factors and protein kinases, while serine repeats are significantly associated with membrane transporter proteins. In most cases the codon structures encoding the repeats at the gene level show a significant bias toward long tracts of one of the possible codons, suggesting that trinucleotide slippage has played an important role in generating these reiterations. However, many, particularly those encoding serine repeats, do not show evidence of slippage. The distributions of codon repeats within proteins and between coding and noncoding regions of the genome, and of amino acids between proteins with different functions, suggest that repeats of these kinds are subject to strong selection.

[1]  D. Housman,et al.  The complex pathology of trinucleotide repeats. , 1997, Current opinion in cell biology.

[2]  C. Wills,et al.  Long, polymorphic microsatellites in simple organisms , 1996, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[3]  H Green,et al.  Codon reiteration and the evolution of proteins. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[4]  G. Gutman,et al.  Slipped-strand mispairing: a major mechanism for DNA sequence evolution. , 1987, Molecular biology and evolution.

[5]  T. Petes,et al.  Instability of simple sequence DNA in Saccharomyces cerevisiae , 1992, Molecular and cellular biology.

[6]  T. Petes,et al.  Genetic control of microsatellite stability. , 1997, Mutation research.

[7]  C. Lobe,et al.  Products of the grg (Groucho-related Gene) Family Can Dimerize through the Amino-terminal Q Domain* , 1996, The Journal of Biological Chemistry.

[8]  J. Leunissen,et al.  Distinct frequency-distributions of homopolymeric DNA tracts in different genomes. , 1998, Nucleic acids research.

[9]  D. Tautz,et al.  Cryptic simplicity in DNA is a major source of genetic variation , 1986, Nature.

[10]  L Pinsky,et al.  Evidence for a repressive function of the long polyglutamine tract in the human androgen receptor: possible pathogenetic relevance for the (CAG)n-expanded neuronopathies. , 1995, Human molecular genetics.

[11]  John M. Hancock,et al.  High sequence turnover in the regulatory regions of the developmental gene hunchback in insects. , 1999, Molecular biology and evolution.

[12]  T. Petes,et al.  Stabilization of microsatellite sequences by variant repeats in the yeast Saccharomyces cerevisiae. , 1997, Genetics.

[13]  M. Behe An overabundance of long oligopurine tracts occurs in the genome of simple and complex eukaryotes. , 1995, Nucleic acids research.

[14]  C. Wills,et al.  Abundant microsatellite polymorphism in Saccharomyces cerevisiae, and the different distributions of microsatellites in eight prokaryotes and S. cerevisiae, result from strong mutation pressures and a variety of selective forces. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[15]  R. Stallings Distribution of trinucleotide microsatellites in different categories of mammalian genomic sequence: implications for human genetic diseases. , 1994, Genomics.

[16]  S. Rusconi,et al.  Transcriptional activation modulated by homopolymeric glutamine and proline stretches. , 1994, Science.

[17]  S. Ohno,et al.  The primitive code and repeats of base oligomers as the primordial protein-encoding sequence. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[18]  S Karlin,et al.  Trinucleotide repeats and long homopeptides in genes and proteins associated with nervous system disease and development. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[19]  H. Mewes,et al.  Overview of the yeast genome. , 1997, Nature.

[20]  B. Dujon,et al.  Trinucleotide repeats in yeast. , 1997, Research in microbiology.

[21]  M. Perutz,et al.  Glutamine Repeats as Polar Zippers: Their Role in Inherited Neurodegenerative Disease , 1995, Molecular medicine.

[22]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[23]  J. Hancock,et al.  Evolution of sequence repetition and gene duplications in the TATA-binding protein TBP (TFIID). , 1993, Nucleic acids research.

[24]  G. Valle TA‐repeat microsatellites are closely associated with ARS consensus sequences in yeast chromosome III , 1993, Yeast.

[25]  Toshimichi Ikemura,et al.  Codon usage tabulated from the international DNA sequence databases , 1997, Nucleic Acids Res..

[26]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[27]  R. Tjian,et al.  Transcriptional regulation in mammalian cells by sequence-specific DNA binding proteins. , 1989, Science.

[28]  Tal Pupko,et al.  Evolution of Microsatellites in the Yeast Saccharomyces cerevisiae: Role of Length and Number of Repeated Units , 1999, Journal of Molecular Evolution.

[29]  R. Flavell,et al.  Molecular coevolution: DNA divergence and the maintenance of function , 1984, Cell.

[30]  B. Dujon,et al.  Distribution and variability of trinucleotide repeats in the genome of the yeast Saccharomyces cerevisiae. , 1996, Gene.

[31]  T. Hayakawa,et al.  Nucleotide compositional constraints on genomes generate alanine-, glycine-, and proline-rich structures in transcription factors. , 1997, Molecular biology and evolution.

[32]  Rainer B. Lanz,et al.  A transcriptional repressor obtained by alternative translation of a trinucleotide repeat , 1995, Nucleic Acids Res..

[33]  S. Artavanis-Tsakonas,et al.  opa: A novel family of transcribed repeats shared by the Notch locus and other developmentally regulated loci in D. melanogaster , 1985, Cell.

[34]  James I. Garrels,et al.  The Yeast Proteome Database (YPD): a model for the organization and presentation of genome-wide functional data , 1999, Nucleic Acids Res..

[35]  D. Falush,et al.  A threshold size for microsatellite expansion. , 1998, Molecular biology and evolution.