Terminal regions of a protein are a hotspot for low complexity regions (LCRs) and selection

A majority of the protein-coding genes consist of low-complexity regions (LCRs) in eukaryotes. Volatile LCRs are a novel source of adaptive variation, functional diversification, and evolutionary novelty. LCRs contribute to a wide range of neurodegenerative disorders. Conversely, these regions also play a pivotal role in critical cellular functions, such as morphogenesis, signaling, and transcriptional regulation. An interplay of selection and mutation governs the composition and length of LCRs. High %GC and mutations provide length variability because of mechanisms like replication slippage. The selection is nearly neutral for expansion/contraction within the normal range and purifying above a critical length. Because of the complex dynamics between selection and mutation, we need a better understanding of the coexistence and mechanisms of the two. Our findings indicate that site-specific positive selection and LCRs prefer the terminal regions of a gene and co-occur in most of the Tetrapoda clades. Interestingly, positively selected sites (PSS) are significantly favored in LCRs in eight of the twelve clades studied. We also observed a significant favor of PSSs in the polyQ region of MAML2 in five clades. We also found that PSSs in a gene have position-specific roles. Terminal-PSS genes are enriched for adenyl nucleotide binding, while central-PSS genes are involved in glycosaminoglycan binding. Moreover, central-PSS genes mainly participate in defense responses, but terminal-PSS genes are non-specific. LCR-containing genes have a significantly higher %GC and lower ω (dN/dS) than genes without repeats across the Tetrapoda clade. A lower ω suggests that even though LCRs provide rapid functional diversity, LCR-containing genes face intense purifying selection.

[1]  G. B. Golding,et al.  Low Complexity Regions in Proteins and DNA are Poorly Correlated , 2023, Molecular biology and evolution.

[2]  N. Vijay,et al.  Lineage-specific protein repeat expansions and contractions reveal malleable regions of immune genes , 2022, Genes & Immunity.

[3]  Byron Lee,et al.  A unified view of low complexity regions (LCRs) across species , 2022, bioRxiv.

[4]  P. Harrison fLPS 2.0: rapid annotation of compositionally-biased regions in biological sequences , 2021, PeerJ.

[5]  O. Galzitskaya,et al.  Structural, Functional, and Evolutionary Characteristics of Proteins with Repeats , 2021, Molecular Biology.

[6]  T. Kunkel,et al.  The fidelity of DNA replication, particularly on GC-rich templates, is reduced by defects of the Fe–S cluster in DNA polymerase δ , 2021, Nucleic acids research.

[7]  Pablo Mier,et al.  The Role of Low Complexity Regions in Protein Interaction Modes: An Illustration in Huntingtin , 2021, International journal of molecular sciences.

[8]  K. Cadigan,et al.  Repression of Wnt/β-catenin signaling by SOX9 and Mastermind-like transcriptional coactivator 2 , 2021, Science Advances.

[9]  Pablo Mier,et al.  Assessing the low complexity of protein sequences via the low complexity triangle. , 2020, PloS one.

[10]  I. Screpanti,et al.  A Dynamic Role of Mastermind-Like 1: A Journey Through the Main (Path)ways Between Development and Cancer , 2020, Frontiers in Cell and Developmental Biology.

[11]  A. Pask,et al.  Evolution and expansion of the RUNX2 QA repeat corresponds with the emergence of vertebrate complexity , 2020, Communications Biology.

[12]  A. Elofsson,et al.  A New Census of Protein Tandem Repeats and Their Relationship with Intrinsic Disorder , 2020, Genes.

[13]  G. Wagner,et al.  Cooption of polyalanine tract into a repressor domain in the mammalian transcription factor HoxA11 , 2020, bioRxiv.

[14]  Runan Yao,et al.  ShinyGO: a graphical gene-set enrichment tool for animals and plants , 2019, Bioinform..

[15]  Junjie Zhao,et al.  Effects of the MAML2 genetic variants in glioma susceptibility and prognosis , 2019, Bioscience reports.

[16]  T. Petes,et al.  GC content elevates mutation and recombination rates in the yeast Saccharomyces cerevisiae , 2018, Proceedings of the National Academy of Sciences.

[17]  A. Escalante,et al.  Comparative analysis of low complexity regions in Plasmodia , 2018, Scientific Reports.

[18]  Gregorio Alanis-Lobato,et al.  Context characterization of amino acid homorepeats using evolution, position, and order , 2017, Proteins.

[19]  E. Koonin,et al.  Positive and strongly relaxed purifying selection drive the evolution of repeats in proteins , 2016, Nature Communications.

[20]  M. Y. Lobanov,et al.  Non-random distribution of homo-repeats: links with biological functions and human diseases , 2016, Scientific Reports.

[21]  M. Kitagawa Notch signalling in the nucleus: roles of Mastermind-like (MAML) transcriptional coactivators. , 2015, Journal of biochemistry.

[22]  Nils A. Berglund,et al.  The role of protein-protein interactions in Toll-like receptor function. , 2015, Progress in biophysics and molecular biology.

[23]  Frederic Rousseau,et al.  Variable Glutamine-Rich Repeats Modulate Transcription Factor Activity , 2015, Molecular cell.

[24]  Tal Pupko,et al.  GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters , 2015, Nucleic Acids Res..

[25]  W. Haerty,et al.  Increased Substitution Rates Surrounding Low-Complexity Regions within Primate Proteins , 2014, Genome biology and evolution.

[26]  M. Luo,et al.  Structural insights into the evolution of the adaptive immune system. , 2013, Annual review of biophysics.

[27]  M. Albà,et al.  Dissecting the role of low-complexity regions in the evolution of vertebrate proteins , 2012, BMC Evolutionary Biology.

[28]  Wilfried Haerty,et al.  Low-complexity sequences and single amino acid repeats: not just "junk" peptide sequences. , 2010, Genome.

[29]  Loris Mularoni,et al.  Natural selection drives the accumulation of amino acid tandem repeats in human proteins. , 2010, Genome research.

[30]  Steve Pettifer,et al.  Low-complexity regions within protein sequences have position-dependent roles , 2010, BMC Systems Biology.

[31]  L. Zakharova Evolution of adaptive immunity , 2009, Biology Bulletin.

[32]  L. Mularoni,et al.  Genome-Wide Analysis of Histidine Repeats Reveals Their Role in the Localization of Human Proteins to the Nuclear Speckles Compartment , 2009, PLoS genetics.

[33]  L. Wu,et al.  Mastermind-like transcriptional co-activators: emerging roles in regulating cross talk among multiple signaling pathways , 2008, Oncogene.

[34]  Vincent J. Lynch,et al.  Resurrecting the Role of Transcription Factor Change in Developmental Evolution , 2008, Evolution; international journal of organic evolution.

[35]  Melanie A. Huntley,et al.  Evolutionary analysis of amino acid repeats across the genomes of 12 Drosophila species. , 2007, Molecular biology and evolution.

[36]  P. Tompa,et al.  Amino acid repeats and the structure and evolution of proteins. , 2007, Genome dynamics.

[37]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[38]  M. Napierala,et al.  CAG and CTG repeat polymorphism in exons of human genes shows distinct features at the expandable loci , 2007, Human mutation.

[39]  L. Mularoni,et al.  Highly constrained proteins contain an unexpectedly large number of amino acid tandem repeats. , 2007, Genomics.

[40]  Melanie A. Huntley,et al.  Selection and slippage creating serine homopolymers. , 2006, Molecular biology and evolution.

[41]  S. Ganesh,et al.  Genomic and evolutionary insights into genes encoding proteins with single amino acid repeats. , 2006, Molecular biology and evolution.

[42]  Y. Kashi,et al.  Simple sequence repeats as advantageous mutators in evolution. , 2006, Trends in genetics : TIG.

[43]  Ronald Wetzel,et al.  Oligoproline effects on polyglutamine conformation and aggregation. , 2006, Journal of molecular biology.

[44]  J. Whisstock,et al.  Functional insights from the distribution and role of homopeptide repeat-containing proteins. , 2005, Genome research.

[45]  John M. Hancock,et al.  Simple sequence repeats in proteins and their significance for network evolution. , 2005, Gene.

[46]  H. Garner,et al.  Molecular origins of rapid and continuous morphological evolution , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[47]  R. Veitia,et al.  A recurrent polyalanine expansion in the transcription factor FOXL2 induces extensive nuclear and cytoplasmic protein aggregation , 2004, Journal of Medical Genetics.

[48]  W. Hendrickson,et al.  Crystal structure of the nuclear effector of Notch signaling, CSL, bound to DNA , 2004, The EMBO journal.

[49]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[50]  R. Guigó,et al.  Comparative analysis of amino acid repeats in rodents and humans. , 2004, Genome research.

[51]  E. Handman,et al.  Leucine-rich repeats in host-pathogen interactions. , 2004, Archivum immunologiae et therapiae experimentalis.

[52]  B. Brais,et al.  Polymorphism, shared functions and convergent evolution of genes with sequences coding for polyalanine domains. , 2003, Human molecular genetics.

[53]  R. Veitia,et al.  Compositional biases and polyalanine runs in humans. , 2003, Genetics.

[54]  J. Baert,et al.  FEV acts as a transcriptional repressor through its DNA-binding ETS domain and alanine-rich domain , 2003, Oncogene.

[55]  Peter Fedor,et al.  A tribute to Claude Shannon (1916-2001) and a plea for more rigorous use of species richness, species diversity and the 'Shannon-Wiener' Index , 2003 .

[56]  T. Nagase,et al.  Identification of New Human Mastermind Proteins Defines a Family That Consists of Positive Regulators for Notch Signaling* , 2002, The Journal of Biological Chemistry.

[57]  T. Sun,et al.  Identification of a Family of Mastermind-Like Transcriptional Coactivators for Mammalian Notch Receptors , 2002, Molecular and Cellular Biology.

[58]  Melvin Cohn,et al.  The immune system: a weapon of mass destruction invented by evolution to even the odds during the war of the DNAs , 2002, Immunological reviews.

[59]  John M. Hancock,et al.  Detecting cryptically simple protein sequences using the SIMPLE algorithm , 2002, Bioinform..

[60]  S. Karlin,et al.  Amino acid runs in eukaryotic proteomes and disease associations , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[61]  N. Dostatni,et al.  Two distinct domains of Bicoid mediate its transcriptional downregulation by the Torso pathway. , 2001, Development.

[62]  C. Ponting,et al.  Protein repeats: structures, functions, and evolution. , 2001, Journal of structural biology.

[63]  Melanie A. Huntley,et al.  Evolution of Simple Sequence in Proteins , 2000, Journal of Molecular Evolution.

[64]  H R Garner,et al.  Repeat polymorphisms within gene regions: phenotypic and evolutionary implications. , 2000, American journal of human genetics.

[65]  John M. Hancock,et al.  Amino Acid Reiterations in Yeast Are Overrepresented in Particular Classes of Proteins and Show Evidence of a Slippage-Like Mutational Process , 1999, Journal of Molecular Evolution.

[66]  John M. Hancock,et al.  Conservation of polyglutamine tract size between mice and humans depends on codon interruption. , 1999, Molecular biology and evolution.

[67]  D. Eisenberg,et al.  A census of protein repeats. , 1999, Journal of molecular biology.

[68]  M. Batzer,et al.  Alu repeats and human disease. , 1999, Molecular genetics and metabolism.

[69]  J. Lis,et al.  DNA distortion and multimerization: novel functions of the glutamine-rich domain of GAGA factor. , 1999, Journal of molecular biology.

[70]  J Deisenhofer,et al.  Proteins with leucine-rich repeats. , 1995, Current opinion in structural biology.

[71]  P. Stern,et al.  Isolation of a cDNA encoding 5T4 oncofetal trophoblast glycoprotein. An antigen associated with metastasis contains leucine-rich repeats. , 1994, The Journal of biological chemistry.

[72]  A. Emili,et al.  Species-specific interaction of the glutamine-rich activation domains of Sp1 with the TATA box-binding protein , 1994 .

[73]  A. Emili,et al.  Species-specific interaction of the glutamine-rich activation domains of Sp1 with the TATA box-binding protein. , 1994, Molecular and cellular biology.

[74]  S. Rusconi,et al.  Transcriptional activation modulated by homopolymeric glutamine and proline stretches. , 1994, Science.

[75]  R. Tjian,et al.  Transcriptional regulation in mammalian cells by sequence-specific DNA binding proteins. , 1989, Science.

[76]  G. Gutman,et al.  Slipped-strand mispairing: a major mechanism for DNA sequence evolution. , 1987, Molecular biology and evolution.

[77]  F. W. Preston The Canonical Distribution of Commonness and Rarity: Part I , 1962 .

[78]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[79]  Lukasz Kurgan,et al.  Exceptionally abundant exceptions: comprehensive characterization of intrinsic disorder in all domains of life , 2014, Cellular and Molecular Life Sciences.

[80]  A. Reiner,et al.  Genetics and neuropathology of Huntington's disease. , 2011, International review of neurobiology.

[81]  Günter P. Wagner,et al.  The Origin of Conserved Protein Domains and Amino Acid Repeats Via Adaptive Competition for Control Over Amino Acid Residues , 2009, Journal of Molecular Evolution.

[82]  W. Bossert,et al.  The Measurement of Diversity , 2001 .

[83]  G. B. Golding,et al.  Simple sequence is abundant in eukaryotic proteins , 1999, Protein science : a publication of the Protein Society.

[84]  Yechezkel Kashi,et al.  Evolutionary tuning knobs , 1997 .