Constraints and consequences of the emergence of amino acid repeats in eukaryotic proteins

Proteins with amino acid homorepeats have the potential to be detrimental to cells and are often associated with human diseases. Why, then, are homorepeats prevalent in eukaryotic proteomes? In yeast, homorepeats are enriched in proteins that are essential and pleiotropic and that buffer environmental insults. The presence of homorepeats increases the functional versatility of proteins by mediating protein interactions and facilitating spatial organization in a repeat-dependent manner. During evolution, homorepeats are preferentially retained in proteins with stringent proteostasis, which might minimize repeat-associated detrimental effects such as unregulated phase separation and protein aggregation. Their presence facilitates rapid protein divergence through accumulation of amino acid substitutions, which often affect linear motifs and post-translational-modification sites. These substitutions may result in rewiring protein interaction and signaling networks. Thus, homorepeats are distinct modules that are often retained in stringently regulated proteins. Their presence facilitates rapid exploration of the genotype–phenotype landscape of a population, thereby contributing to adaptation and fitness.

[1]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[2]  M. Pagano,et al.  Degradation of cyclin A is regulated by acetylation , 2009, Oncogene.

[3]  Melanie A. Huntley,et al.  Evolutionary analysis of amino acid repeats across the genomes of 12 Drosophila species. , 2007, Molecular biology and evolution.

[4]  K. McGraw,et al.  A common language effect size statistic. , 1992 .

[5]  Dong Yang,et al.  Acetylation-Mediated Proteasomal Degradation of Core Histones during DNA Repair and Spermatogenesis , 2013, Cell.

[6]  Gaston H. Gonnet,et al.  OMA 2011: orthology inference among 1000 complete genomes , 2010, Nucleic Acids Res..

[7]  Hsien-Da Huang,et al.  Clusters of Nucleotide Substitutions and Insertion/Deletion Mutations Are Associated with Repeat Sequences , 2011, PLoS biology.

[8]  Y. Pilpel,et al.  Determinants of translation efficiency and accuracy , 2011, Molecular systems biology.

[9]  S. Rogers,et al.  Amino acid sequences common to rapidly degraded proteins: the PEST hypothesis. , 1986, Science.

[10]  B. Futcher,et al.  The Cln3‐Cdc28 kinase complex of S. cerevisiae is regulated by proteolysis and phosphorylation. , 1992, The EMBO journal.

[11]  S. Mirkin,et al.  Role of DNA polymerases in repeat-mediated genome instability. , 2012, Cell reports.

[12]  S. Teichmann,et al.  Tight Regulation of Unstructured Proteins: From Transcript Synthesis to Protein Degradation , 2008, Science.

[13]  S. Lindquist,et al.  Intrinsically Disordered Proteins Drive Emergence and Inheritance of Biological Traits , 2016, Cell.

[14]  S. Lindquist,et al.  Protein homeostasis and the phenotypic manifestation of genetic diversity: principles and mechanisms. , 2010, Annual review of genetics.

[15]  Paul M. Harrison,et al.  LPS-annotate: complete annotation of compositionally biased regions in the protein knowledgebase , 2011, Database J. Biol. Databases Curation.

[16]  Michael J. McDonald,et al.  Mutation at a distance caused by homopolymeric guanine repeats in Saccharomyces cerevisiae , 2016, Science Advances.

[17]  M. Mann,et al.  Cytoplasmic protein aggregates interfere with nucleocytoplasmic transport of protein and RNA , 2016, Science.

[18]  Emmanuel D. Levy,et al.  How Perfect Can Protein Interactomes Be? , 2009, Science Signaling.

[19]  H. Leonhardt,et al.  The polyserine domain of the lysyl-5 hydroxylase Jmjd6 mediates subnuclear localization. , 2013, The Biochemical journal.

[20]  W. Haerty,et al.  Increased Substitution Rates Surrounding Low-Complexity Regions within Primate Proteins , 2014, Genome biology and evolution.

[21]  R. Jackson,et al.  The mechanism of eukaryotic translation initiation and principles of its regulation , 2010, Nature Reviews Molecular Cell Biology.

[22]  Y. Barral,et al.  A Super-Assembly of Whi3 Encodes Memory of Deceptive Encounters by Single Cells during Yeast Courtship , 2013, Cell.

[23]  L. Mularoni,et al.  Genome-Wide Analysis of Histidine Repeats Reveals Their Role in the Localization of Human Proteins to the Nuclear Speckles Compartment , 2009, PLoS genetics.

[24]  Patrick J. Killion,et al.  Genetic reconstruction of a functional transcriptional regulatory network , 2007, Nature Genetics.

[25]  S. Rusconi,et al.  Transcriptional activation modulated by homopolymeric glutamine and proline stretches. , 1994, Science.

[26]  Andrew D. Ellington,et al.  Widespread reorganization of metabolic enzymes into reversible assemblies upon nutrient starvation , 2009, Proceedings of the National Academy of Sciences.

[27]  M. Babu,et al.  Sequence composition of disordered regions fine-tunes protein half-life , 2015, Nature Structural &Molecular Biology.

[28]  Zsuzsanna Dosztányi,et al.  ANCHOR: web server for predicting protein binding regions in disordered proteins , 2009, Bioinform..

[29]  M. Vidal,et al.  Edgetic perturbation of a C. elegans BCL2 ortholog , 2009, Nature Methods.

[30]  John M. Hancock,et al.  Tandem and cryptic amino acid repeats accumulate in disordered regions of proteins , 2009, Genome Biology.

[31]  A. Muotri,et al.  Polyglutamine-expanded androgen receptor interferes with TFEB to elicit pathological autophagy defects in SBMA , 2014, Nature Neuroscience.

[32]  S. Mirkin,et al.  The hidden side of unstable DNA repeats: Mutagenesis at a distance. , 2015, DNA repair.

[33]  Stephen W. Michnick,et al.  Mechanisms and Consequences of Macromolecular Phase Separation , 2016, Cell.

[34]  T. Stearns,et al.  Methods in yeast genetics , 2013 .

[35]  Richard I. Morimoto,et al.  Progressive Disruption of Cellular Protein Folding in Models of Polyglutamine Diseases , 2006, Science.

[36]  A. Elofsson,et al.  What properties characterize the hub proteins of the protein-protein interaction network of Saccharomyces cerevisiae? , 2006, Genome Biology.

[37]  Ignacio Tinoco,et al.  Following translation by single ribosomes one codon at a time , 2008, Nature.

[38]  B. Stillman,et al.  Immunoblotting histones from yeast whole-cell protein extracts. , 2013, Cold Spring Harbor protocols.

[39]  Bernard F. Buxton,et al.  The DISOPRED server for the prediction of protein disorder , 2004, Bioinform..

[40]  M. Babu,et al.  Cellular Strategies for Regulating Functional and Nonfunctional Protein Aggregation , 2012, Cell reports.

[41]  A. Jacobson,et al.  mRNA poly(A) tail, a 3' enhancer of translational initiation , 1990, Molecular and cellular biology.

[42]  Geoffrey I. Webb,et al.  RCPdb: An evolutionary classification and codon usage database for repeat-containing proteins. , 2007, Genome research.

[43]  H. Garner,et al.  Molecular origins of rapid and continuous morphological evolution , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Juan Botas,et al.  The AXH Domain of Ataxin-1 Mediates Neurodegeneration through Its Interaction with Gfi-1/Senseless Proteins , 2005, Cell.

[45]  J. Golden,et al.  A polyalanine tract expansion in Arx forms intranuclear inclusions and results in increased cell death , 2004, The Journal of cell biology.

[46]  T. Michael,et al.  Simple Sequence Repeats Provide a Substrate for Phenotypic Variation in the Neurospora crassa Circadian Clock , 2007, PloS one.

[47]  W. Wilcox,et al.  Trinucleotide expansion mutations in the cartilage oligomeric matrix protein (COMP) gene. , 1999, Human molecular genetics.

[48]  Jian-Rong Yang,et al.  Determinants of the rate of protein sequence evolution , 2015, Nature Reviews Genetics.

[49]  A. Murray,et al.  Cyclin is degraded by the ubiquitin pathway , 1991, Nature.

[50]  M. Kirschner,et al.  The KEN box: an APC recognition signal distinct from the D box targeted by Cdh1. , 2000, Genes & development.

[51]  S. Karlin,et al.  Amino acid runs in eukaryotic proteomes and disease associations , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[52]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[53]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[54]  Huda Y. Zoghbi,et al.  Diseases of Unstable Repeat Expansion: Mechanisms and Common Principles , 2005, Nature Reviews Genetics.

[55]  J. Taylor,et al.  Repeat expansion disease: progress and puzzles in disease pathogenesis , 2010, Nature Reviews Genetics.

[56]  Matthieu Legendre,et al.  Variable tandem repeats accelerate evolution of coding and regulatory sequences. , 2010, Annual review of genetics.

[57]  R. Grissom,et al.  Effect Sizes for Research : Univariate and Multivariate Applications, Second Edition , 2005 .

[58]  M. Borsuk,et al.  Protein aggregation behavior regulates cyclin transcript localization and cell-cycle control. , 2013, Developmental cell.

[59]  Mehdi M. Kashani,et al.  Large-Scale Genetic Perturbations Reveal Regulatory Networks and an Abundance of Gene-Specific Repressors , 2014, Cell.

[60]  J. François,et al.  Validation of reference genes for quantitative expression analysis by real-time RT-PCR in Saccharomyces cerevisiae , 2009, BMC Molecular Biology.

[61]  John M. Hancock,et al.  Amino Acid Reiterations in Yeast Are Overrepresented in Particular Classes of Proteins and Show Evidence of a Slippage-Like Mutational Process , 1999, Journal of Molecular Evolution.

[62]  J. Whisstock,et al.  Functional insights from the distribution and role of homopeptide repeat-containing proteins. , 2005, Genome research.

[63]  Ulrich Stelzl,et al.  Dual Coordination of Post Translational Modifications in Human Protein Networks , 2013, PLoS Comput. Biol..

[64]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[65]  M. Huynen,et al.  Intrinsically Disordered Segments Affect Protein Half-Life in the Cell and during Evolution , 2014, Cell reports.

[66]  Michelle R. Leonard,et al.  Slipped (CTG)•(CAG) repeats can be correctly repaired, escape repair or undergo error-prone repair , 2005, Nature Structural &Molecular Biology.

[67]  Edith D. Wong,et al.  Saccharomyces Genome Database: the genomics resource of budding yeast , 2011, Nucleic Acids Res..

[68]  Sean B. Carroll,et al.  Evolution of a transcriptional repression domain in an insect Hox protein , 2002, Nature.

[69]  M. Gerstein,et al.  A method to assess compositional bias in biological sequences and its application to prion-like glutamine/asparagine-rich domains in eukaryotic proteomes , 2003, Genome Biology.

[70]  J. Sanes,et al.  Overexpression of wild-type androgen receptor in muscle recapitulates polyglutamine disease , 2007, Proceedings of the National Academy of Sciences.

[71]  Andrey V. Kajava,et al.  T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm , 2009, Bioinform..

[72]  Frederic Rousseau,et al.  Variable Glutamine-Rich Repeats Modulate Transcription Factor Activity , 2015, Molecular cell.

[73]  R. Veitia,et al.  Differential aggregation and functional impairment induced by polyalanine expansions in FOXL2, a transcription factor involved in cranio-facial and ovarian development. , 2007, Human molecular genetics.

[74]  Elizabeth N. Koch,et al.  Conserved rules govern genetic interaction degree across species , 2012, Genome Biology.

[75]  R. Mann,et al.  A Balance Between Two Nuclear Localization Sequences and a Nuclear Export Sequence Governs Extradenticle Subcellular Localization , 2007, Genetics.

[76]  R. Guigó,et al.  Comparative analysis of amino acid repeats in rodents and humans. , 2004, Genome research.

[77]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[78]  M Madan Babu,et al.  Asymmetric mRNA localization contributes to fidelity and sensitivity of spatially localized systems , 2014, Nature Structural &Molecular Biology.

[79]  John M. Hancock,et al.  Simple sequence repeats in proteins and their significance for network evolution. , 2005, Gene.

[80]  A. Adams,et al.  Methods in yeast genetics : a Cold Spring Harbor Laboratory course manual , 1998 .