Role of low-complexity sequences in the formation of novel protein coding sequences.

Low-complexity sequences are extremely abundant in eukaryotic proteins for reasons that remain unclear. One hypothesis is that they contribute to the formation of novel coding sequences, facilitating the generation of novel protein functions. Here, we test this hypothesis by examining the content of low-complexity sequences in proteins of different age. We show that recently emerged proteins contain more low-complexity sequences than older proteins and that these sequences often form functional domains. These data are consistent with the idea that low-complexity sequences may play a key role in the emergence of novel genes.

[1]  M. Albà,et al.  Inverse relationship between evolutionary rate and age of mammalian genes. , 2005, Molecular biology and evolution.

[2]  John M. Hancock,et al.  Detecting cryptically simple protein sequences using the SIMPLE algorithm , 2002, Bioinform..

[3]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[4]  Andrew M. Jenkinson,et al.  Ensembl 2009 , 2008, Nucleic Acids Res..

[5]  H Green,et al.  Codon reiteration and the evolution of proteins. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[6]  L. Mularoni,et al.  Genome-Wide Analysis of Histidine Repeats Reveals Their Role in the Localization of Human Proteins to the Nuclear Speckles Compartment , 2009, PLoS genetics.

[7]  H. Green,et al.  Structure and evolution of the human involucrin gene , 1986, Cell.

[8]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[9]  T. Hayakawa,et al.  Nucleotide compositional constraints on genomes generate alanine-, glycine-, and proline-rich structures in transcription factors. , 1997, Molecular biology and evolution.

[10]  J. Wootton,et al.  Analysis of compositionally biased regions in sequence databases. , 1996, Methods in enzymology.

[11]  John M. Hancock,et al.  Tandem and cryptic amino acid repeats accumulate in disordered regions of proteins , 2009, Genome Biology.

[12]  P. Tompa,et al.  Amino acid repeats and the structure and evolution of proteins. , 2007, Genome dynamics.

[13]  G. B. Golding,et al.  Simple sequence is abundant in eukaryotic proteins , 1999, Protein science : a publication of the Protein Society.

[14]  Lucia Y Brown,et al.  Alanine tracts: the expanding story of human illness and trinucleotide repeats. , 2004, Trends in genetics : TIG.

[15]  K. Kim,et al.  Tendency for local repetitiveness in amino acid usages in modern proteins. , 1999, Journal of molecular biology.

[16]  Mei Peng,et al.  The direction of microsatellite mutations is dependent upon allele length , 2000, Nature Genetics.

[17]  Michael R. Green,et al.  Arginine-serine-rich domains bound at splicing enhancers contact the branchpoint to promote prespliceosome assembly. , 2004, Molecular cell.

[18]  S. Karlin,et al.  Amino acid runs in eukaryotic proteomes and disease associations , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Huda Y. Zoghbi,et al.  Diseases of Unstable Repeat Expansion: Mechanisms and Common Principles , 2005, Nature Reviews Genetics.

[20]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[21]  S. Ohno,et al.  Birth of a unique enzyme from an alternative reading frame of the preexisted, internally repetitious coding sequence. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[22]  R. Guigó,et al.  Comparative analysis of amino acid repeats in rodents and humans. , 2004, Genome research.

[23]  K. Kawasaki,et al.  The evolution of milk casein genes from tooth genes before the origin of mammals. , 2011, Molecular biology and evolution.

[24]  Golding Gb,et al.  Simple sequence is abundant in eukaryotic proteins. , 1999 .

[25]  Dmitri A. Petrov,et al.  Relaxed Purifying Selection and Possibly High Rate of Adaptation in Primate Lineage-Specific Genes , 2010, Genome biology and evolution.

[26]  D. Eisenberg,et al.  A census of protein repeats. , 1999, Journal of molecular biology.

[27]  K. Jeang,et al.  Glutamine-rich Domains Activate Transcription in YeastSaccharomyces cerevisiae * , 1998, The Journal of Biological Chemistry.

[28]  S. Ohno,et al.  The primitive code and repeats of base oligomers as the primordial protein-encoding sequence. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Y. Kashi,et al.  Simple sequence repeats as advantageous mutators in evolution. , 2006, Trends in genetics : TIG.

[30]  H. Ellegren Microsatellites: simple sequences with complex evolution , 2004, Nature Reviews Genetics.