Context characterization of amino acid homorepeats using evolution, position, and order

Amino acid repeats, or homorepeats, are low complexity protein motifs consisting of tandem repetitions of a single amino acid. Their presence and relative number vary in different proteomes, and some studies have tried to address this variation, proteome by proteome. In this work, we present a full characterization of amino acid homorepeats across evolution. We studied the presence and differential usage of each possible homorepeat in proteomes from various taxonomic groups, using clusters of very similar proteins to eliminate redundancy. The position of each amino acid repeat within proteins, and the order of co‐occurring amino acid repeats were also addressed. As a result, we present evidence about the unevenly evolution of homorepeats, as well as the functional implications of their relative position in proteins. We discuss some of these cases in their taxonomic context. Collectively, our results show evolutionary and positional signals that suggest that homorepeats have biological function, likely creating unspecific protein interactions or modulating specific interactions in a context dependent manner. In conclusion, our work supports the functional importance of homorepeats and establishes a basis for the study of other low complexity repeats. Proteins 2017; 85:709–719. © 2016 Wiley Periodicals, Inc.

[1]  R. Coppel,et al.  Repetitive proteins and genes of malaria. , 1987, Annual review of microbiology.

[2]  T. Chung,et al.  Myristylation and polylysine-mediated activation of the protein kinase domain of the large subunit of herpes simplex virus type 2 ribonucleotide reductase (ICP10). , 1990, Virology.

[3]  C. Lechuga,et al.  A polylysine-induced aggregation of substrate accompanies the stimulation of casein kinase II by polylysine. , 1993, The Biochemical journal.

[4]  Geoffrey J. Barton,et al.  Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation , 1993, Comput. Appl. Biosci..

[5]  G. von Heijne,et al.  A 12-Residue-long Polyleucine Tail Is Sufficient to Anchor Synaptobrevin to the Endoplasmic Reticulum Membrane (*) , 1996, The Journal of Biological Chemistry.

[6]  Diying Huang,et al.  An early Cambrian craniate-like chordate , 1999, Nature.

[7]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[8]  Jonathan E. Allen,et al.  Genome sequence of the human malaria parasite Plasmodium falciparum , 2002, Nature.

[9]  R. Russell,et al.  Amino‐Acid Properties and Consequences of Substitutions , 2003 .

[10]  R. Guigó,et al.  Comparative analysis of amino acid repeats in rodents and humans. , 2004, Genome research.

[11]  David L. Steffen,et al.  The genome of the social amoeba Dictyostelium discoideum , 2005, Nature.

[12]  J. Whisstock,et al.  Functional insights from the distribution and role of homopeptide repeat-containing proteins. , 2005, Genome research.

[13]  B. Winsor,et al.  SH3 domain-containing proteins and the actin cytoskeleton in yeast. , 2005, Biochemical Society transactions.

[14]  SH3 domain-containing proteins and the actin cytoskeleton in yeast , 2005 .

[15]  Ronald Wetzel,et al.  Oligoproline effects on polyglutamine conformation and aggregation. , 2006, Journal of molecular biology.

[16]  Paul M. Harrison,et al.  Exhaustive assignment of compositional bias reveals universally prevalent biased regions: analysis of functional associations in human and Drosophila , 2006, BMC Bioinformatics.

[17]  P. Tompa,et al.  Amino acid repeats and the structure and evolution of proteins. , 2007, Genome dynamics.

[18]  J. Cáceres,et al.  The SR protein family of splicing factors: master regulators of gene expression. , 2009, The Biochemical journal.

[19]  L. Mularoni,et al.  Genome-Wide Analysis of Histidine Repeats Reveals Their Role in the Localization of Human Proteins to the Nuclear Speckles Compartment , 2009, PLoS genetics.

[20]  G. Sachetto-Martins,et al.  Functional diversity of the plant glycine-rich proteins superfamily , 2010, Plant signaling & behavior.

[21]  Vladimir N Uversky,et al.  Protein tandem repeats - the more perfect, the less structured. , 2010, The FEBS journal.

[22]  Pawel P. Labaj,et al.  Single amino acid repeats in signal peptides , 2010, The FEBS journal.

[23]  S. Brunak,et al.  SignalP 4.0: discriminating signal peptides from transmembrane regions , 2011, Nature Methods.

[24]  Martin H. Schaefer,et al.  Evolution and function of CAG/polyglutamine repeats in protein–protein interaction networks , 2012, Nucleic acids research.

[25]  Martin H. Schaefer,et al.  HIPPIE: Integrating Protein Interaction Networks with Experiment Based Quality Scores , 2012, PloS one.

[26]  O. Galzitskaya,et al.  Occurrence of disordered patterns and homorepeats in eukaryotic and bacterial proteomes. , 2012, Molecular bioSystems.

[27]  Daniel E. Goldberg,et al.  Asparagine Repeats in Plasmodium falciparum Proteins: Good for Nothing? , 2013, PLoS pathogens.

[28]  Martin H. Schaefer,et al.  Aggregation of polyQ-extended proteins is promoted by interaction with their natural coiled-coil partners , 2013, BioEssays : news and reviews in molecular, cellular and developmental biology.

[29]  Michail Yu. Lobanov,et al.  HRaP: database of occurrence of HomoRepeats and patterns in proteomes , 2013, Nucleic Acids Res..

[30]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[31]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[32]  F. Fiumara,et al.  Differential Occurrence of Interactions and Interaction Domains in Proteins Containing Homopolymeric Amino Acid Repeats , 2015, Front. Genet..

[33]  R. Murphy,et al.  Asparagine Repeat Peptides: Aggregation Kinetics and Comparison with Glutamine Repeats. , 2015, Biochemistry.

[34]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[35]  Miguel A. Andrade-Navarro,et al.  FastaHerder2: Four Ways to Research Protein Function and Evolution with Clustering and Clustered Databases , 2016, J. Comput. Biol..