Low-complexity sequences and single amino acid repeats: not just "junk" peptide sequences.

For decades proteins were thought to interact in a "lock and key" system, which led to the definition of a paradigm linking stable three-dimensional structure to biological function. As a consequence, any non-structured peptide was considered to be nonfunctional and to evolve neutrally. Surprisingly, the most commonly shared peptides between eukaryotic proteomes are low-complexity sequences that in most conditions do not present a stable three-dimensional structure. However, because these sequences evolve rapidly and because the size variation of a few of them can have deleterious effects, low-complexity sequences have been suggested to be the target of selection. Here we review evidence that supports the idea that these simple sequences should not be considered just "junk" peptides and that selection drives the evolution of many of them.

[1]  P. Romero,et al.  Sequence complexity of disordered protein , 2001, Proteins.

[2]  Harry T Orr,et al.  Trinucleotide repeat disorders. , 2007, Annual review of neuroscience.

[3]  Lucia Y Brown,et al.  Alanine tracts: the expanding story of human illness and trinucleotide repeats. , 2004, Trends in genetics : TIG.

[4]  E. Nevo,et al.  Biological clock in total darkness: The Clock/MOP3 circadian system of the blind subterranean mole rat , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  M. V. Katti,et al.  Differential distribution of simple sequence repeats in eukaryotic genome sequences. , 2001, Molecular biology and evolution.

[6]  N. Gemmell,et al.  The rise, fall and renaissance of microsatellites in eukaryotic genomes. , 2006, BioEssays : news and reviews in molecular, cellular and developmental biology.

[7]  Melanie A. Huntley,et al.  Evolutionary analysis of amino acid repeats across the genomes of 12 Drosophila species. , 2007, Molecular biology and evolution.

[8]  S. Lovell Are non‐functional, unfolded proteins (‘junk proteins’) common in the genome? , 2003, FEBS letters.

[9]  L. Aravind,et al.  Plasmodium Biology Genomic Gleanings , 2003, Cell.

[10]  Christopher J. Oldfield,et al.  Intrinsically disordered protein. , 2001, Journal of molecular graphics & modelling.

[11]  John M. Hancock,et al.  A relationship between lengths of microsatellites and nearby substitution rates in mammalian genomes. , 2001, Molecular biology and evolution.

[12]  Rainer B. Lanz,et al.  A transcriptional repressor obtained by alternative translation of a trinucleotide repeat , 1995, Nucleic Acids Res..

[13]  Minoru Tanaka,et al.  Positional Cloning of the Mouse Circadian Clock Gene , 1997, Cell.

[14]  David L. Steffen,et al.  The genome of the social amoeba Dictyostelium discoideum , 2005, Nature.

[15]  S. Tsuji,et al.  Expanded polyglutamine stretches form an ‘aggresome’ , 2002, Neuroscience Letters.

[16]  John M. Hancock,et al.  Conservation of polyglutamine tract size between mice and humans depends on codon interruption. , 1999, Molecular biology and evolution.

[17]  L. A. Sawyer,et al.  melanogaster populations: implications for selection. , 2006 .

[18]  Caitlin L. Chicoine,et al.  Net charge per residue modulates conformational ensembles of intrinsically disordered proteins , 2010, Proceedings of the National Academy of Sciences.

[19]  Pardis C Sabeti,et al.  Positive Selection of a Pre-Expansion CAG Repeat of the Human SCA2 Gene , 2005, PLoS genetics.

[20]  C. E. Pearson,et al.  Repeat instability: mechanisms of dynamic mutations , 2005, Nature Reviews Genetics.

[21]  W. Wilcox,et al.  Trinucleotide expansion mutations in the cartilage oligomeric matrix protein (COMP) gene. , 1999, Human molecular genetics.

[22]  cytotoxicity of expanded polyglutamine proteins Normal-repeat-length polyglutamine peptides accelerate aggregation nucleation and , 2006 .

[23]  István Simon,et al.  Prediction of protein disorder at the domain level. , 2007, Current protein & peptide science.

[24]  Roderic Guigó,et al.  Mutation patterns of amino acid tandem repeats in the human proteome , 2006, Genome Biology.

[25]  Sean B. Carroll,et al.  Evolution of a transcriptional repression domain in an insect Hox protein , 2002, Nature.

[26]  John C. Wootton,et al.  Statistics of Local Complexity in Amino Acid Sequences and Sequence Databases , 1993, Comput. Chem..

[27]  R. Guigó,et al.  Comparative analysis of amino acid repeats in rodents and humans. , 2004, Genome research.

[28]  William McGinnis,et al.  Hox protein mutation and macroevolution of the insect body plan , 2002, Nature.

[29]  Christopher J. Oldfield,et al.  Evolutionary Rate Heterogeneity in Proteins with Long Disordered Regions , 2002, Journal of Molecular Evolution.

[30]  D. Eisenberg,et al.  A census of protein repeats. , 1999, Journal of molecular biology.

[31]  T. Michael,et al.  Simple Sequence Repeats Provide a Substrate for Phenotypic Variation in the Neurospora crassa Circadian Clock , 2007, PloS one.

[32]  M. Mathews,et al.  The Growth Factor Granulin Interacts with Cyclin T1 and Modulates P-TEFb-Dependent Transcription , 2002, Molecular and Cellular Biology.

[33]  R. Waterland,et al.  Dnmt1 deficiency promotes CAG repeat expansion in the mouse germline. , 2008, Human molecular genetics.

[34]  Nobuaki Yoshida,et al.  Morphological change caused by loss of the taxon-specific polyalanine tract in Hoxd-13. , 2006, Molecular biology and evolution.

[35]  W. Haerty,et al.  Increased Polymorphism Near Low-Complexity Sequences across the Genomes of Plasmodium falciparum Isolates , 2011, Genome biology and evolution.

[36]  J. J. Flynn,et al.  The correlated evolution of Runx2 tandem repeats, transcriptional activity, and facial length in Carnivora , 2007, Evolution & development.

[37]  Charalambos P. Kyriacou,et al.  Temporal Mating Isolation Driven by a Behavioral Gene in Drosophila , 2003, Current Biology.

[38]  S. Karlin,et al.  Amino acid runs in eukaryotic proteomes and disease associations , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Wei-Lun Hsu,et al.  Proportion of solvent-exposed amino acids in a protein and rate of protein evolution. , 2007, Molecular biology and evolution.

[40]  Golding Gb,et al.  Simple sequence is abundant in eukaryotic proteins. , 1999 .

[41]  C. Schlötterer,et al.  Mismatch repair-driven mutational bias in D. melanogaster. , 2002, Molecular cell.

[42]  A. Delcher,et al.  Triplet repeat length bias and variation in the human transcriptome , 2009, Proceedings of the National Academy of Sciences.

[43]  M. Banks,et al.  A latitudinal cline in the Chinook salmon (Oncorhynchus tshawytscha) Clock gene: evidence for selection on PolyQ length variants , 2008, Proceedings of the Royal Society B: Biological Sciences.

[44]  Huda Y. Zoghbi,et al.  Diseases of Unstable Repeat Expansion: Mechanisms and Common Principles , 2005, Nature Reviews Genetics.

[45]  Zoran Obradovic,et al.  DisProt: the Database of Disordered Proteins , 2006, Nucleic Acids Res..

[46]  L. Mularoni,et al.  Highly constrained proteins contain an unexpectedly large number of amino acid tandem repeats. , 2007, Genomics.

[47]  S. Mirkin Expandable DNA repeats and human disease , 2007, Nature.

[48]  G. Gutman,et al.  Slipped-strand mispairing: a major mechanism for DNA sequence evolution. , 1987, Molecular biology and evolution.

[49]  I. Kanazawa,et al.  Expanded polyglutamine stretches interact with TAFII130, interfering with CREB-dependent transcription , 2000, Nature Genetics.

[50]  C. Kyriacou,et al.  Clines in clock genes: fine-tuning circadian rhythms to the environment. , 2008, Trends in genetics : TIG.

[51]  H R Garner,et al.  Repeat polymorphisms within gene regions: phenotypic and evolutionary implications. , 2000, American journal of human genetics.

[52]  Andreas Vitalis,et al.  Characterizing the conformational ensemble of monomeric polyglutamine , 2005, Proteins.

[53]  T. Hayakawa,et al.  Nucleotide compositional constraints on genomes generate alanine-, glycine-, and proline-rich structures in transcription factors. , 1997, Molecular biology and evolution.

[54]  G. B. Golding,et al.  Simple sequence is abundant in eukaryotic proteins , 1999, Protein science : a publication of the Protein Society.

[55]  S. Mundlos,et al.  Polyalanine expansion in HOXA13: three new affected families and the molecular consequences in a mouse model. , 2004, Human molecular genetics.

[56]  L. Iakoucheva,et al.  Intrinsic Disorder and Protein Function , 2002 .

[57]  John M. Hancock,et al.  Simple sequence repeats in proteins and their significance for network evolution. , 2005, Gene.

[58]  Kim Lan Sim,et al.  Abundance and Distributions of Eukaryote Protein Simple Sequences* , 2002, Molecular & Cellular Proteomics.

[59]  C P Kyriacou,et al.  Linkage disequilibrium, mutational analysis and natural selection in the repetitive region of the clock gene, period, in Drosophila melanogaster. , 1997, Genetical research.

[60]  Hong Luo,et al.  ProRepeat: an integrated repository for studying amino acid tandem repeats in proteins , 2011, Nucleic Acids Res..

[61]  John M. Hancock,et al.  A role for selection in regulating the evolutionary emergence of disease-causing and other coding CAG repeats in humans and mice. , 2001, Molecular biology and evolution.

[62]  Y. Kashi,et al.  Simple sequence repeats as advantageous mutators in evolution. , 2006, Trends in genetics : TIG.

[63]  S. Rossiter,et al.  Adaptive evolution of 5'HoxD genes in the origin and diversification of the cetacean flipper. , 2008, Molecular biology and evolution.

[64]  Melanie A. Huntley,et al.  Selection and slippage creating serine homopolymers. , 2006, Molecular biology and evolution.

[65]  Mark A DePristo,et al.  On the abundance, amino acid composition, and evolutionary dynamics of low-complexity regions in proteins. , 2006, Gene.

[66]  W. Haerty,et al.  Similar selective factors affect both between-gene and between-exon divergence in Drosophila. , 2009, Molecular biology and evolution.

[67]  E. Nevo,et al.  Microsatellites within genes: structure, function, and evolution. , 2004, Molecular biology and evolution.

[68]  R. Lahue,et al.  Stabilizing Effects of Interruptions on Trinucleotide Repeat Expansions in Saccharomyces cerevisiae , 2000, Molecular and Cellular Biology.

[69]  Melanie A. Huntley,et al.  Simple sequences are rare in the Protein Data Bank , 2002, Proteins.

[70]  V. Uversky Intrinsically Disordered Proteins , 2000 .

[71]  D. Tautz,et al.  A Comparison of Homologous Developmental Genes from Drosophila and Tribolium Reveals Major Differences in Length and Trinucleotide Repeat Content , 1999, Journal of Molecular Evolution.

[72]  R. Durrett,et al.  Equilibrium distributions of microsatellite repeat length resulting from a balance between slippage events and point mutations. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[73]  A. Keith Dunker,et al.  Intrinsic Disorder in the Protein Data Bank , 2007, Journal of biomolecular structure & dynamics.

[74]  Thomas K. Darlington,et al.  Closing the circadian loop: CLOCK-induced transcription of its own inhibitors per and tim. , 1998, Science.

[75]  S. Ganesh,et al.  Genomic and evolutionary insights into genes encoding proteins with single amino acid repeats. , 2006, Molecular biology and evolution.

[76]  L. Mularoni,et al.  Genome-Wide Analysis of Histidine Repeats Reveals Their Role in the Localization of Human Proteins to the Nuclear Speckles Compartment , 2009, PLoS genetics.

[77]  Karen Usdin,et al.  The biological effects of simple tandem repeats: lessons from the repeat expansion diseases. , 2008, Genome research.

[78]  Carri-Lyn R. Mead,et al.  CAG-encoded polyglutamine length polymorphism in the human genome , 2007, BMC Genomics.

[79]  S. T. Warren,et al.  Polyalanine Expansion in Synpolydactyly Might Result from Unequal Crossing-Over of HOXD13 , 1997, Science.

[80]  John M. Hancock,et al.  Tandem and cryptic amino acid repeats accumulate in disordered regions of proteins , 2009, Genome Biology.

[81]  A. Novelletto,et al.  Population variation analysis at nine loci containing expressed trinucleotide repeats , 1997, Annals of human genetics.

[82]  A Keith Dunker,et al.  Alternative splicing in concert with protein intrinsic disorder enables increased functional diversity in multicellular organisms. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[83]  J. S. Sodhi,et al.  Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. , 2004, Journal of molecular biology.

[84]  C. Lindqvist,et al.  Polyglutamine variation in a flowering time protein correlates with island age in a Hawaiian plant radiation , 2007, BMC Evolutionary Biology.

[85]  A. Dalby A Comparative Proteomic Analysis of the Simple Amino Acid Repeat Distributions in Plasmodia Reveals Lineage Specific Amino Acid Selection , 2009, PloS one.

[86]  J. Whisstock,et al.  Functional insights from the distribution and role of homopeptide repeat-containing proteins. , 2005, Genome research.

[87]  Ryan M. Bannen,et al.  Effect of low-complexity regions on protein structure determination , 2007, Journal of Structural and Functional Genomics.

[88]  Ronald Wetzel,et al.  Fluorescence correlation spectroscopy shows that monomeric polyglutamine molecules form collapsed structures in aqueous solutions , 2006, Proceedings of the National Academy of Sciences.

[89]  H. Ellegren Microsatellites: simple sequences with complex evolution , 2004, Nature Reviews Genetics.

[90]  Wilfried Haerty,et al.  Genome-wide evidence for selection acting on single amino acid repeats. , 2010, Genome research.

[91]  L. A. Sawyer,et al.  Natural variation in a Drosophila clock gene and temperature compensation. , 1997, Science.

[92]  Stephen J. Tapscott,et al.  CTCF cis-Regulates Trinucleotide Repeat Instability in an Epigenetic Manner: A Novel Basis for Mutational Hot Spot Determination , 2008, PLoS genetics.

[93]  Peter Tompa,et al.  Structure and Function of Intrinsically Disordered Proteins , 2009 .

[94]  Geoffrey I. Webb,et al.  RCPdb: An evolutionary classification and codon usage database for repeat-containing proteins. , 2007, Genome research.

[95]  H. Garner,et al.  Molecular origins of rapid and continuous morphological evolution , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[96]  M. Owen,et al.  Mutations Involving the Transcription Factor CBFA1 Cause Cleidocranial Dysplasia , 1997, Cell.

[97]  John M. Hancock,et al.  The Comparative Genomics of Polyglutamine Repeats: Extreme Difference in the Codon Organization of Repeat-Encoding Regions Between Mammals and Drosophila , 2001, Journal of Molecular Evolution.

[98]  Albert H. Mao,et al.  Role of backbone-solvent interactions in determining conformational equilibria of intrinsically disordered proteins. , 2008, Journal of the American Chemical Society.

[99]  D. Monckton,et al.  Cis-acting modifiers of expanded CAG/CTG triplet repeat expandability: associations with flanking GC content and proximity to CpG islands. , 1999, Human molecular genetics.

[100]  John M. Hancock,et al.  Amino Acid Reiterations in Yeast Are Overrepresented in Particular Classes of Proteins and Show Evidence of a Slippage-Like Mutational Process , 1999, Journal of Molecular Evolution.

[101]  B. Kempenaers,et al.  Avian Clock gene polymorphism: evidence for a latitudinal cline in allele frequencies , 2007, Molecular ecology.