Length and sequence dependent accumulation of simple sequence repeats in vertebrates: potential role in genome organization and regulation.

Simple sequence repeats (SSRs) or microsatellites are tandemly repeated short DNA sequence motifs found to be abundant in higher eukaryotes. Enrichment of SSRs with increasing genome complexity points to a positive selection and their functional relevance. We analyzed genomes of 24 organisms to find features that may help understand the functional relevance of SSRs. Of the 501 possible SSRs, only 73 show length specific enrichment. We also noticed that ~45 bp is the optimum length for a majority of them particularly in the human genome. Finally, we observed non-random distribution of ACG and CCG, enriched around transcriptional start sites (TSSs) in several species. Taken together, these results suggest that SSRs are functionally relevant with potential regulatory role. We propose that such repeats are evolving under positive selection pressure like any other functional element in the genome.

[1]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[2]  Palle Villesen,et al.  FaBox: an online toolbox for fasta sequences , 2007 .

[3]  Loris Mularoni,et al.  Natural selection drives the accumulation of amino acid tandem repeats in human proteins. , 2010, Genome research.

[4]  B. Oostra,et al.  Molecular dissection of the events leading to inactivation of the FMR1 gene. , 2005, Human molecular genetics.

[5]  M. Talikka,et al.  Human Sex Hormone-binding Globulin Promoter Activity Is Influenced by a (TAAAA) n Repeat Element within an Alu Sequence* , 2001, The Journal of Biological Chemistry.

[6]  J. Taylor,et al.  Repeat expansion disease: progress and puzzles in disease pathogenesis , 2010, Nature Reviews Genetics.

[7]  N. Rangaraj,et al.  AAGAG repeat RNA is an essential component of nuclear matrix in Drosophila , 2013, RNA biology.

[8]  K. Worley,et al.  The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and Evolution , 2009, Science.

[9]  Peter Donnelly,et al.  The Influence of Recombination on Human Genetic Diversity , 2006, PLoS genetics.

[10]  N. Gemmell,et al.  Measuring Microsatellite Conservation in Mammalian Evolution with a Phylogenetic Birth–Death Model , 2012, Genome biology and evolution.

[11]  G. Cutting,et al.  A variable dinucleotide repeat in the CFTR gene contributes to phenotype diversity by forming RNA secondary structures that alter splicing. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[12]  A. Goodridge,et al.  A CT repeat in the promoter of the chicken malic enzyme gene is essential for function at an alternative transcription start site. , 1998, Archives of biochemistry and biophysics.

[13]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[14]  Andreas D Baxevanis,et al.  Searching NCBI Databases Using Entrez , 2004, Current protocols in bioinformatics.

[15]  D. Landsman,et al.  Repetitive DNA elements, nucleosome binding and human gene expression. , 2009, Gene.

[16]  J. Kypr,et al.  Dimerization of the guanine-adenine repeat strands of DNA. , 1999, Nucleic acids research.

[17]  L. Singh,et al.  Genome-wide analysis of microsatellite repeats in humans: their abundance and density in specific genomic regions , 2003, Genome Biology.

[18]  H. Ellegren Microsatellite mutations in the germline: implications for evolutionary inference. , 2000, Trends in genetics : TIG.

[19]  T. Petes,et al.  Microsatellite instability in yeast: dependence on the length of the microsatellite. , 1997, Genetics.

[20]  J. Miret,et al.  Orientation-dependent and sequence-specific expansions of CTG/CAG trinucleotide repeats in Saccharomyces cerevisiae. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[21]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[22]  C. Penha-Gonçalves,et al.  Long perfect dinucleotide repeats are typical of vertebrates, show motif preferences and size convergence. , 2004, Molecular biology and evolution.

[23]  Mary Goldman,et al.  The UCSC Genome Browser database: extensions and updates 2013 , 2012, Nucleic Acids Res..

[24]  E. Greene,et al.  Repeat-induced epigenetic changes in intron 1 of the frataxin gene and its consequences in Friedreich ataxia , 2007, Nucleic acids research.

[25]  Alex van Belkum,et al.  Short-Sequence DNA Repeats in Prokaryotic Genomes , 1998, Microbiology and Molecular Biology Reviews.

[26]  K. Lesch,et al.  A promoter-associated polymorphic repeat modulates PAX-6 expression in human brain. , 1998, Biochemical and biophysical research communications.

[27]  K. Dybvig,et al.  GAA Trinucleotide Repeat Region Regulates M9/pMGA Gene Expression in Mycoplasma gallisepticum , 2000, Infection and Immunity.

[28]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[29]  Yiqiang Zhao,et al.  Genome-wide analysis reveals regulatory role of G4 DNA in gene transcription. , 2008, Genome research.

[30]  J. Jurka,et al.  Microsatellites in different eukaryotic genomes: survey and analysis. , 2000, Genome research.

[31]  L. Cavalli-Sforza,et al.  High resolution of human evolutionary trees with polymorphic microsatellites , 1994, Nature.

[32]  E. Cook,et al.  Interethnic difference in the allelic distribution of human epidermal growth factor receptor intron 1 polymorphism. , 2003, Clinical cancer research : an official journal of the American Association for Cancer Research.

[33]  G. Achaz,et al.  Evolution of Coding Microsatellites in Primate Genomes , 2013, Genome biology and evolution.

[34]  Aleksandar Milosavljevic,et al.  Abundance and length of simple repeats in vertebrate genomes are determined by their structural properties. , 2008, Genome research.

[35]  L. Singh,et al.  GATA simple sequence repeats function as enhancer blocker boundaries , 2013, Nature Communications.

[36]  R. Mishra,et al.  Repeat performance: how do genome packaging and regulation depend on simple sequence repeats? , 2010, BioEssays : news and reviews in molecular, cellular and developmental biology.

[37]  Michael A. Black,et al.  Microsatellite Tandem Repeats Are Abundant in Human Promoters and Are Associated with Regulatory Elements , 2013, PloS one.

[38]  S. S. Smith,et al.  Hypermethylation of telomere-like foldbacks at codon 12 of the human c-Ha-ras gene and the trinucleotide repeat of the FMR-1 gene of fragile X. , 1994, Journal of molecular biology.

[39]  D. Leach,et al.  Secondary structures in d(CGG) and d(CCG) repeat tracts. , 1998, Journal of molecular biology.

[40]  K. Usdin,et al.  Chromatin Remodeling in the Noncoding Repeat Expansion Diseases* , 2009, Journal of Biological Chemistry.

[41]  D. Trabzuni,et al.  The Friedreich ataxia GAA repeat expansion mutation induces comparable epigenetic changes in human and transgenic mouse brain and heart tissues. , 2007, Human molecular genetics.

[42]  Rakesh K. Mishra,et al.  Genome-wide analysis of Bkm sequences (GATA repeats): predominant association with sex chromosomes and potential role in higher order chromatin organization and function , 2003, Bioinform..