Amino acid repeats and the structure and evolution of proteins.

Many proteins have repeats or runs of single amino acids. The pathogenicity of some repeat expansions has fueled proteomic, genomic and structural explorations of homopolymeric runs not only in human but in a wide variety of other organisms. Other types of amino acid repetitive structures exhibit more complex patterns than homopeptides. Irrespective of their precise organization, repetitive sequences are defined as low complexity or simple sequences, as one or a few residues are particularly abundant. Prokaryotes show a relatively low frequency of simple sequences compared to eukaryotes. In the latter the percentage of proteins containing homopolymeric runs varies greatly from one group to another. For instance, within vertebrates, amino acid repeat frequency is much higher in mammals than in amphibians, birds or fishes. For some repeats, this is correlated with the GC-richness of the regions containing the corresponding genes. Homopeptides tend to occur in disordered regions of transcription factors or developmental proteins. They can trigger the formation of protein aggregates, particularly in 'disease' proteins. Simple sequences seem to evolve more rapidly than the rest of the protein/gene and may have a functional impact. Therefore, they are good candidates to promote rapid evolutionary changes. All these diverse facets of homopolymeric runs are explored in this review.

[1]  D. Tautz,et al.  Comparison of the gap segmentation gene hunchback between Drosophila melanogaster and Drosophila virilis reveals novel modes of evolutionary change. , 1989, The EMBO journal.

[2]  J T Finch,et al.  Glutamine repeats as polar zippers: their possible role in inherited neurodegenerative diseases. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[3]  R. Stallings Distribution of trinucleotide microsatellites in different categories of mammalian genomic sequence: implications for human genetic diseases. , 1994, Genomics.

[4]  M. Williamson,et al.  The structure and function of proline-rich regions in proteins. , 1994, The Biochemical journal.

[5]  H Green,et al.  Codon reiteration and the evolution of proteins. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[6]  S. Rusconi,et al.  Transcriptional activation modulated by homopolymeric glutamine and proline stretches. , 1994, Science.

[7]  C A Ross,et al.  When more is less: Pathogenesis of glutamine repeat neurodegenerative diseases , 1995, Neuron.

[8]  S Karlin,et al.  Trinucleotide repeats and long homopeptides in genes and proteins associated with nervous system disease and development. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[9]  T. Hayakawa,et al.  Nucleotide compositional constraints on genomes generate alanine-, glycine-, and proline-rich structures in transcription factors. , 1997, Molecular biology and evolution.

[10]  E. Altschuler,et al.  Random coil conformation for extended polyglutamine stretches in aqueous soluble monomeric peptides. , 2009, The journal of peptide research : official journal of the American Peptide Society.

[11]  R. Houghten,et al.  Polyalanine-based peptides as models for self-associated beta-pleated-sheet complexes. , 1997, Biochemistry.

[12]  D. Eisenberg,et al.  A census of protein repeats. , 1999, Journal of molecular biology.

[13]  M. Nishizawa,et al.  Local‐scale repetitiveness in amino acid use in eukaryote protein sequences: A genomic factor in protein evolution , 1999, Proteins.

[14]  John M. Hancock,et al.  Amino Acid Reiterations in Yeast Are Overrepresented in Particular Classes of Proteins and Show Evidence of a Slippage-Like Mutational Process , 1999, Journal of Molecular Evolution.

[15]  D. Mortlock,et al.  Evolution of N-terminal sequences of the vertebrate HOXA13 protein , 2000, Mammalian Genome.

[16]  V. Uversky Intrinsically Disordered Proteins , 2000 .

[17]  A. Ikai,et al.  Spring mechanics of α-helical polypeptide , 2000 .

[18]  J. Jurka,et al.  Microsatellites in different eukaryotic genomes: survey and analysis. , 2000, Genome research.

[19]  H. Ellegren Microsatellite mutations in the germline: implications for evolutionary inference. , 2000, Trends in genetics : TIG.

[20]  E. Young,et al.  Trinucleotide repeats are clustered in regulatory genes in Saccharomyces cerevisiae. , 2000, Genetics.

[21]  H R Garner,et al.  Repeat polymorphisms within gene regions: phenotypic and evolutionary implications. , 2000, American journal of human genetics.

[22]  K. Devriendt,et al.  Spectrum of FOXL2 gene mutations in blepharophimosis-ptosis-epicanthus inversus (BPES) families demonstrates a genotype--phenotype correlation. , 2001, Human molecular genetics.

[23]  D. Neuhaus,et al.  Solution studies of chymotrypsin inhibitor-2 glutamine insertion mutants show no interglutamine interactions. , 2001, Biochemical and biophysical research communications.

[24]  T. Hashikawa,et al.  Intra- and Intermolecular β-Pleated Sheet Formation in Glutamine-repeat Inserted Myoglobin as a Model for Polyglutamine Diseases* , 2001, The Journal of Biological Chemistry.

[25]  P. Loll,et al.  An expanded glutamine repeat destabilizes native ataxin-3 structure and mediates formation of parallel β-fibrils , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[26]  John M. Hancock,et al.  Detecting cryptically simple protein sequences using the SIMPLE algorithm , 2002, Bioinform..

[27]  Annalisa Pastore,et al.  Solution structure of polyglutamine tracts in GST‐polyglutamine fusion proteins , 2002, FEBS letters.

[28]  Sean B. Carroll,et al.  Evolution of a transcriptional repression domain in an insect Hox protein , 2002, Nature.

[29]  P. Tompa Intrinsically unstructured proteins evolve by repeat expansion , 2003, BioEssays : news and reviews in molecular, cellular and developmental biology.

[30]  R. Veitia,et al.  Compositional biases and polyalanine runs in humans. , 2003, Genetics.

[31]  J. Montoya-Burgos,et al.  Recombination explains isochores in mammalian genomes. , 2003, Trends in genetics : TIG.

[32]  B. Brais,et al.  Polymorphism, shared functions and convergent evolution of genes with sequences coding for polyalanine domains. , 2003, Human molecular genetics.

[33]  L. Barron,et al.  New insight into the solution structures of wheat gluten proteins from Raman optical activity. , 2003, Biochemistry.

[34]  C. Hall,et al.  Molecular dynamics simulations of spontaneous fibril formation by random-coil peptides. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[35]  H. Garner,et al.  Molecular origins of rapid and continuous morphological evolution , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[36]  R. Veitia,et al.  A recurrent polyalanine expansion in the transcription factor FOXL2 induces extensive nuclear and cytoplasmic protein aggregation , 2004, Journal of Medical Genetics.

[37]  Existence of specific "folds" in polyproline II ensembles of an "unfolded"alanine peptide detected by molecular dynamics. , 2004, Journal of the American Chemical Society.

[38]  E. Nevo,et al.  Microsatellites within genes: structure, function, and evolution. , 2004, Molecular biology and evolution.

[39]  V. Uversky,et al.  Conformational constraints for amyloid fibrillation: the importance of being unfolded. , 2004, Biochimica et biophysica acta.

[40]  R. Weiner,et al.  Identification and analysis of polyserine linker domains in prokaryotic proteins with emphasis on the marine bacterium Microbulbifer degradans , 2004, Protein science : a publication of the Protein Society.

[41]  Kim Lan Sim,et al.  Protein simple sequence conservation , 2004, Proteins.

[42]  K. Griebenow,et al.  The conformation of tetraalanine in water determined by polarized Raman, FT-IR, and VCD spectroscopy. , 2004, Journal of the American Chemical Society.

[43]  R. Guigó,et al.  Comparative analysis of amino acid repeats in rodents and humans. , 2004, Genome research.

[44]  Lucia Y Brown,et al.  Alanine tracts: the expanding story of human illness and trinucleotide repeats. , 2004, Trends in genetics : TIG.

[45]  P. Stenson,et al.  Complex gene rearrangements caused by serial replication slippage , 2005, Human mutation.

[46]  Sandrine Caburet,et al.  Coding repeats and evolutionary "agility". , 2005, BioEssays : news and reviews in molecular, cellular and developmental biology.

[47]  J. Whisstock,et al.  Functional insights from the distribution and role of homopeptide repeat-containing proteins. , 2005, Genome research.

[48]  Interspecific comparison of the unusually repetitiveDrosophila locusmastermind , 1991, Journal of Molecular Evolution.

[49]  Robert A. Grothe,et al.  Structure of the cross-β spine of amyloid-like fibrils , 2005, Nature.

[50]  S. Lindquist,et al.  Structural insights into a yeast prion illuminate nucleation and strain diversity , 2005, Nature.

[51]  John M. Hancock,et al.  Simple sequence repeats in proteins and their significance for network evolution. , 2005, Gene.

[52]  Zhengshuang Shi,et al.  Neighbor effect on PPII conformation in alanine peptides. , 2005, Journal of the American Chemical Society.

[53]  M. Albà,et al.  Inverse relationship between evolutionary rate and age of mammalian genes. , 2005, Molecular biology and evolution.

[54]  Ronald Wetzel,et al.  Oligoproline effects on polyglutamine conformation and aggregation. , 2006, Journal of molecular biology.

[55]  Zhengshuang Shi,et al.  PII structure in the model peptides for unfolded proteins: Studies on ubiquitin fragments and several alanine‐rich peptides containing QQQ, SSS, FFF, and VVV , 2005, Proteins.