Evolution of Protein Domain Repeats in Metazoa

Repeats are ubiquitous elements of proteins and they play important roles for cellular function and during evolution. Repeats are, however, also notoriously difficult to capture computationally and large scale studies so far had difficulties in linking genetic causes, structural properties and evolutionary trajectories of protein repeats. Here we apply recently developed methods for repeat detection and analysis to a large dataset comprising over hundred metazoan genomes. We find that repeats in larger protein families experience generally very few insertions or deletions (indels) of repeat units but there is also a significant fraction of noteworthy volatile outliers with very high indel rates. Analysis of structural data indicates that repeats with an open structure and independently folding units are more volatile and more likely to be intrinsically disordered. Such disordered repeats are also significantly enriched in sites with a high functional potential such as linear motifs. Furthermore, the most volatile repeats have a high sequence similarity between their units. Since many volatile repeats also show signs of recombination, we conclude they are often shaped by concerted evolution. Intriguingly, many of these conserved yet volatile repeats are involved in host-pathogen interactions where they might foster fast but subtle adaptation in biological arms races. Key Words: protein evolution, domain rearrangements, protein repeats, concerted evolution.

[1]  A. Kajava,et al.  Review: proteins with repeated sequence--structural prediction and modeling. , 2001, Journal of structural biology.

[2]  Macarena Toll-Riera,et al.  Emergence of novel domains in proteins , 2013, BMC Evolutionary Biology.

[3]  Manuel A. S. Santos,et al.  Evolution of pathogenicity and sexual reproduction in eight Candida genomes , 2009, Nature.

[4]  E. Bornberg-Bauer,et al.  How do new proteins arise? , 2010, Current opinion in structural biology.

[5]  P. Tompa Intrinsically unstructured proteins evolve by repeat expansion , 2003, BioEssays : news and reviews in molecular, cellular and developmental biology.

[6]  Sarah A Teichmann,et al.  Relative rates of gene fusion and fission in multi-domain proteins. , 2005, Trends in genetics : TIG.

[7]  M. Wayne,et al.  The Rate of Unequal Crossing Over in the dumpy Gene from Drosophila melanogaster , 2010, Journal of Molecular Evolution.

[8]  T. Pawson,et al.  Assembly of Cell Regulatory Systems Through Protein Interaction Domains , 2003, Science.

[9]  Zsuzsanna Dosztányi,et al.  Bioinformatical approaches to characterize intrinsically disordered/unstructured proteins , 2010, Briefings Bioinform..

[10]  Hilla Peretz,et al.  Ju n 20 03 Schrödinger ’ s Cat : The rules of engagement , 2003 .

[11]  Avishai Henik,et al.  SUPPRESSION SITUATIONS IN PSYCHOLOGICAL RESEARCH : DEFINITIONS, IMPLICATIONS, AND APPLICATIONS , 1991 .

[12]  A. Elofsson,et al.  Domain rearrangements in protein evolution. , 2005, Journal of molecular biology.

[13]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[14]  K. Hokamp,et al.  The extracellular Leucine-Rich Repeat superfamily; a comparative survey and analysis of evolutionary relationships and expression patterns , 2007, BMC Genomics.

[15]  Andrey V Kajava,et al.  Tandem repeats in proteins: from sequence to structure. , 2012, Journal of structural biology.

[16]  Zsuzsanna Dosztányi,et al.  ANCHOR: web server for predicting protein binding regions in disordered proteins , 2009, Bioinform..

[17]  Matthieu Legendre,et al.  Variable tandem repeats accelerate evolution of coding and regulatory sequences. , 2010, Annual review of genetics.

[18]  Arne Elofsson,et al.  Expansion of Protein Domain Repeats , 2006, PLoS Comput. Biol..

[19]  P Bork,et al.  Novel protein domains and repeats in Drosophila melanogaster: insights into structure, function, and evolution. , 2001, Genome research.

[20]  A. Elofsson,et al.  Long indels are disordered: a study of disorder and indels in homologous eukaryotic proteins. , 2013, Biochimica et biophysica acta.

[21]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[22]  Martin Vingron,et al.  Improved detection of overrepresentation of Gene-Ontology annotations with parent-child analysis , 2007, Bioinform..

[23]  Anirvan Ghosh,et al.  Control of neural circuit formation by leucine-rich repeat proteins , 2014, Trends in Neurosciences.

[24]  Matko Bosnjak,et al.  REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms , 2011, PloS one.

[25]  L. Luo,et al.  Role of leucine-rich repeat proteins in the development and function of neural circuits. , 2011, Annual review of cell and developmental biology.

[26]  Gustavo Caetano-Anollés,et al.  The evolutionary mechanics of domain organization in proteomes and the rise of modularity in the protein world. , 2009, Structure.

[27]  Daniel C. Desrosiers,et al.  The ankyrin repeat as molecular architecture for protein recognition , 2004, Protein science : a publication of the Protein Society.

[28]  S. Teichmann,et al.  Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination , 2004, Journal of Structural and Functional Genomics.

[29]  Axel Voigt,et al.  A NOMPC-Dependent Membrane-Microtubule Connector Is a Candidate for the Gating Spring in Fly Mechanoreceptors , 2013, Current Biology.

[30]  B Brinkmann,et al.  Mutation rate in human microsatellites: influence of the structure and length of the tandem repeat. , 1998, American journal of human genetics.

[31]  Erich Bornberg-Bauer,et al.  The Dynamics and Evolutionary Potential of Domain Loss and Emergence , 2011, Molecular biology and evolution.

[32]  J. Byrnes,et al.  Role of positive selection in the retention of duplicate genes in mammalian genomes , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Peer Bork,et al.  PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments , 2006, Nucleic Acids Res..

[34]  Andrey V Kajava,et al.  PRDB: Protein Repeat DataBase , 2012, Proteomics.

[35]  Erich Bornberg-Bauer,et al.  Dynamics and Adaptive Benefits of Protein Domain Emergence and Arrangements during Plant Genome Evolution , 2012, Genome biology and evolution.

[36]  N. Pochet,et al.  Sequence-based estimation of minisatellite and microsatellite repeat variability. , 2007, Genome research.

[37]  Adam Godzik,et al.  Strong functional patterns in the evolution of eukaryotic genomes revealed by the reconstruction of ancestral protein domain repertoires , 2011, Genome Biology.

[38]  Andrew D. Moore,et al.  Quantification and functional analysis of modular protein evolution in a dense phylogenetic tree. , 2013, Biochimica et biophysica acta.

[39]  H. Dyson,et al.  Coupling of folding and binding for unstructured proteins. , 2002, Current opinion in structural biology.

[40]  C. Ponting,et al.  On the evolution of protein folds: are similar motifs in different protein folds the result of convergence, insertion, or relics of an ancient peptide world? , 2001, Journal of structural biology.

[41]  George D Rose,et al.  The role of introns in repeat protein gene formation. , 2006, Journal of molecular biology.

[42]  S. Teichmann,et al.  Domain combinations in archaeal, eubacterial and eukaryotic proteomes. , 2001, Journal of molecular biology.

[43]  Richard J. Edwards,et al.  ELM—the database of eukaryotic linear motifs , 2011, Nucleic Acids Res..

[44]  Sarah A. Teichmann,et al.  Protein domain organisation: adding order , 2009, BMC Bioinformatics.

[45]  Christian M. Zmasek,et al.  This Déjà Vu Feeling—Analysis of Multidomain Protein Evolution in Eukaryotic Genomes , 2012, PLoS Comput. Biol..

[46]  O. Schueler‐Furman,et al.  Increased sequence conservation of domain repeats in prokaryotic proteins. , 2010, Trends in genetics : TIG.

[47]  Tao Liu,et al.  TreeFam: 2008 Update , 2007, Nucleic Acids Res..

[48]  Jimin Pei,et al.  AL2CO: calculation of positional conservation in a protein sequence alignment , 2001, Bioinform..

[49]  Sudhir Kumar,et al.  Molecular clocks: four decades of evolution , 2005, Nature Reviews Genetics.

[50]  A. Elofsson,et al.  Protein expansion is primarily due to indels in intrinsically disordered regions. , 2013, Molecular biology and evolution.

[51]  Silvio C. E. Tosatto,et al.  RepeatsDB: a database of tandem repeat protein structures , 2013, Nucleic Acids Res..

[52]  D. Bartholomeu,et al.  Repeat-enriched proteins are related to host cell invasion and immune evasion in parasitic protozoa. , 2013, Molecular biology and evolution.

[53]  Thomas Lengauer,et al.  Improved scoring of functional groups from gene expression data by decorrelating GO graph structure , 2006, Bioinform..

[54]  Tijana Z Grove,et al.  Ligand binding by repeat proteins: natural and designed. , 2008, Current opinion in structural biology.

[55]  O. Gascuel,et al.  Deep Conservation of Human Protein Tandem Repeats within the Eukaryotes , 2014, Molecular biology and evolution.

[56]  E. Bornberg-Bauer,et al.  Domain deletions and substitutions in the modular protein evolution , 2006, The FEBS journal.

[57]  E. Bornberg-Bauer,et al.  The Rise and Fall of TRP-N, an Ancient Family of Mechanogated Ion Channels, in Metazoa , 2015, Genome biology and evolution.

[58]  C. Ponting,et al.  Protein repeats: structures, functions, and evolution. , 2001, Journal of structural biology.

[59]  Jonathon Howard,et al.  Hypothesis: A helix of ankyrin repeats of the NOMPC-TRP ion channel is the gating spring of mechanoreceptors , 2004, Current Biology.

[60]  D. Bryant,et al.  A Simple and Robust Statistical Test for Detecting the Presence of Recombination , 2006, Genetics.

[61]  S. Teichmann,et al.  The importance of sequence diversity in the aggregation and evolution of proteins , 2005, Nature.

[62]  Cyrus Chothia,et al.  SUPERFAMILY—sophisticated comparative genomics, data mining, visualization and phylogeny , 2008, Nucleic Acids Res..

[63]  D. Liao,et al.  Concerted evolution: molecular mechanism and biological implications. , 1999, American journal of human genetics.

[64]  Fran Lewitter,et al.  Intragenic tandem repeats generate functional variability , 2005, Nature Genetics.

[65]  S. Saupe,et al.  Genesis of a Fungal Non-Self Recognition Repertoire , 2007, PloS one.

[66]  S. Mirkin,et al.  DNA structures, repeat expansions and human hereditary disorders. , 2006, Current opinion in structural biology.

[67]  P. Wincker,et al.  Differential gene retention as an evolutionary mechanism to generate biodiversity and adaptation in yeasts , 2015, Scientific Reports.

[68]  L. Iakoucheva,et al.  The importance of intrinsic disorder for protein phosphorylation. , 2004, Nucleic acids research.

[69]  SödingJohannes Protein homology detection by HMM--HMM comparison , 2005 .

[70]  S. Yi,et al.  Understanding relationship between sequence and functional evolution in yeast proteins , 2007, Genetica.

[71]  R. Russell,et al.  WD40 proteins propel cellular networks. , 2010, Trends in biochemical sciences.

[72]  Adam Godzik,et al.  Comparative analysis of protein domain organization. , 2004, Genome research.

[73]  Albert J. Vilella,et al.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. , 2009, Genome research.

[74]  M. Anisimova,et al.  The evolution and function of protein tandem repeats in plants. , 2015, The New phytologist.

[75]  Zsuzsanna Dosztányi,et al.  Prediction of Protein Binding Regions in Disordered Proteins , 2009, PLoS Comput. Biol..

[76]  Zoran Obradovic,et al.  Exploring bias in the Protein Data Bank using contrast classifiers. , 2004, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[77]  Maria Anisimova,et al.  Graph-based modeling of tandem repeats improves global multiple sequence alignment , 2013, Nucleic acids research.

[78]  Donald G Truhlar,et al.  An Ancient Fingerprint Indicates the Common Ancestry of Rossmann-Fold Enzymes Utilizing Different Ribose-Based Cofactors , 2016, PLoS biology.

[79]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[80]  D. Eisenberg,et al.  A census of protein repeats. , 1999, Journal of molecular biology.

[81]  Alessio Vecchio,et al.  Ab initio detection of fuzzy amino acid tandem repeats in protein sequences , 2012, BMC Bioinformatics.

[82]  S. Bordenstein,et al.  Tandem-repeat protein domains across the tree of life , 2015, PeerJ.

[83]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[84]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[85]  P. Tompa,et al.  The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. , 2005, Journal of molecular biology.

[86]  Vladimir N Uversky,et al.  Protein tandem repeats - the more perfect, the less structured. , 2010, The FEBS journal.

[87]  E. Sonnhammer,et al.  Evolution of protein domain architectures. , 2012, Methods in molecular biology.

[88]  Andrew D. Moore,et al.  Arrangements in the modular evolution of proteins. , 2008, Trends in biochemical sciences.

[89]  Maria Anisimova,et al.  Markov Models of Amino Acid Substitution to Study Proteins with Intrinsically Disordered Regions , 2011, PloS one.

[90]  Christopher J. Oldfield,et al.  Classification of Intrinsically Disordered Regions and Proteins , 2014, Chemical reviews.

[91]  E. Bornberg-Bauer,et al.  Evolution of circular permutations in multidomain proteins. , 2006, Molecular biology and evolution.

[92]  Norman E. Davey,et al.  Attributes of short linear motifs. , 2012, Molecular bioSystems.

[93]  Andrew D. Moore,et al.  Just how versatile are domains? , 2008, BMC Evolutionary Biology.

[94]  Erich Bornberg-Bauer,et al.  Dynamics and adaptive benefits of modular protein evolution. , 2013, Current opinion in structural biology.

[95]  Manisha Sharma,et al.  Expansion and Function of Repeat Domain Proteins During Stress and Development in Plants , 2016, Front. Plant Sci..

[96]  Johannes Söding,et al.  De novo identification of highly diverged protein repeats by probabilistic consistency , 2008, Bioinform..

[97]  S. Mirkin Expandable DNA repeats and human disease , 2007, Nature.