Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression

MOTIVATION Genomic sequences are highly redundant and contain many types of repetitive DNA. Fuzzy tandem repeats (FTRs) are of particular interest. They are found in regulatory regions of eukaryotic genes and are reported to interact with transcription factors. However, accurate assessment of FTR occurrences in different genome segments requires specific algorithm for efficient FTR identification and classification. RESULTS We have obtained formulas for P-values of FTR occurrence and developed an FTR identification algorithm implemented in TandemSWAN software. Using TandemSWAN we compared the structure and the occurrence of FTRs with short period length (up to 24 bp) in coding and non-coding regions including UTRs, heterochromatic, intergenic and enhancer sequences of Drosophila melanogaster and Drosophila pseudoobscura. Tandems with period three and its multiples were found in coding segments, whereas FTRs with periods multiple of six are overrepresented in all non-coding segment. Periods equal to 5-7 and 11-14 were characteristic of the enhancer regions and other non-coding regions close to genes. AVAILABILITY TandemSWAN web page, stand-alone version and documentation can be found at http://bioinform.genetika.ru/projects/swan/www/ SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Honghui Wan,et al.  Pseudo-periodic partitions of biological sequences , 2004, Bioinform..

[2]  R. Martienssen,et al.  Maintenance of heterochromatin by RNA interference of tandem repeats , 2003, Nature Genetics.

[3]  D J Porteous,et al.  Genomic sequence analysis of Fugu rubripes CFTR and flanking genes in a 60 kb region conserving synteny with 800 kb of human chromosome 7. , 2000, Genome research.

[4]  R. Blake,et al.  Stacking energies in DNA. , 1991, The Journal of biological chemistry.

[5]  Anna G. Nazina,et al.  Distance preferences in the arrangement of binding motifs and hierarchical levels in organization of transcription regulatory information. , 2003, Nucleic acids research.

[6]  N G Esipova,et al.  [Similarities in periodical structures in the position of nucleotides in regions of initiation of replication of bacterial genomes]. , 2002, Biofizika.

[7]  N G Esipova,et al.  [Periodicity in contacts of RNA-polymerase with promotors]. , 1999, Biofizika.

[8]  David L. Steffen,et al.  The DNA sequence of the human X chromosome , 2005, Nature.

[9]  Vladimir G. Tumanyan,et al.  Search of periodicities in primary structure of biopolymers: a general Fourier approach , 1996, Comput. Appl. Biosci..

[10]  M. V. Katti,et al.  Differential distribution of simple sequence repeats in eukaryotic genome sequences. , 2001, Molecular biology and evolution.

[11]  Gad M. Landau,et al.  An Algorithm for Approximate Tandem Repeats , 2001, J. Comput. Biol..

[12]  Anna G. Nazina,et al.  Extraction of functional binding sites from unique regulatory regions: the Drosophila early developmental enhancers. , 2002, Genome research.

[13]  T. Boulikas,et al.  Chromatin domains and prediction of MAR sequences. , 1995, International review of cytology.

[14]  Webb Miller,et al.  Evolution and functional classification of vertebrate gene deserts. , 2005, Genome research.

[15]  S. Lifson On the Crucial Stages in the Origin of Animate Matter , 1997, Journal of Molecular Evolution.

[16]  E V Korotkov,et al.  Method revealing latent periodicity of the nucleotide sequences modified for a case of small samples. , 1999, DNA research : an international journal for rapid publication of reports on genes and genomes.

[17]  M. Laubichler Review of: Carroll, Sean B., Jennifer K. Grenier and Scott D. Weatherbee: From DNA to diversity : molecular genetics and the evolution of animal design. Malden, Mass [u.a.]: Blackwell Science 2001 , 2003 .

[18]  S. Carroll,et al.  From DNA to Diversity: Molecular Genetics and the Evolution of Animal Design , 2000 .

[19]  Eugene W. Myers,et al.  Identifying satellites in nucleic acid sequences , 1998, RECOMB '98.

[20]  S Karlin,et al.  Efficient algorithms for molecular sequence analysis. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[21]  G Achaz,et al.  Study of intrachromosomal duplications among the eukaryote genomes. , 2001, Molecular biology and evolution.

[22]  J. Weber,et al.  Abundant class of human DNA polymorphisms which can be typed using the polymerase chain reaction. , 1989, American journal of human genetics.

[23]  J. García-Foncillas,et al.  Polymorphisms of the repeated sequences in the enhancer region of the thymidylate synthase gene promoter may predict downstaging after preoperative chemoradiation in rectal cancer. , 2001, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[24]  M. Q. Zhang,et al.  Periodical distribution of transcription factor sites in promoter regions and connection with chromatin structure. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[25]  L. K. Hansen,et al.  Repeated sequences from theArabidopsis thaliana genome function as enhancers in transgenic tobacco , 1996, Molecular and General Genetics MGG.

[26]  Q Gao,et al.  Targeting gene expression to the head: the Drosophila orthodenticle gene is a direct target of the Bicoid morphogen. , 1998, Development.

[27]  G. Dover,et al.  Molecular drive: a cohesive mode of species evolution , 1982, Nature.

[28]  Gary H Karpen,et al.  Sequence analysis of a functional Drosophila centromere. , 2003, Genome research.

[29]  M. Stratton,et al.  Instability of short tandem repeats (microsatellites) in human cancers , 1994, Nature Genetics.

[30]  S N Thibodeau,et al.  Microsatellite instability in cancer of the proximal colon. , 1993, Science.

[31]  G Vergnaud,et al.  Minisatellites: mutability and genome architecture. , 2000, Genome research.

[32]  K. Wetterstrand,et al.  The distribution and frequency of microsatellite loci in Drosophila melanogaster , 1998, Molecular ecology.

[33]  H. Ellegren Microsatellites: simple sequences with complex evolution , 2004, Nature Reviews Genetics.

[34]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[35]  M. Gelfand,et al.  Evolution of transcription factor DNA binding sites. , 2005, Gene.

[36]  E. Myers,et al.  Finishing a whole-genome shotgun: Release 3 of the Drosophila melanogaster euchromatic genome sequence , 2002, Genome Biology.

[37]  L. Singh,et al.  Genome-wide analysis of microsatellite repeats in humans: their abundance and density in specific genomic regions , 2003, Genome Biology.

[38]  M. V. Katti,et al.  Amino acid repeat patterns in protein sequences: Their diversity and structural‐functional implications , 2000, Protein science : a publication of the Protein Society.

[39]  M Lishner,et al.  Microsatellite instability in patients with chronic B-cell lymphocytic leukaemia , 2005, British Journal of Cancer.

[40]  Gregory Kucherov,et al.  mreps: efficient and flexible detection of tandem repeats in DNA , 2003, Nucleic Acids Res..

[41]  Manish S. Shah,et al.  A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomes , 1993, Cell.

[42]  J. Sutcliffe,et al.  Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome , 1991, Cell.

[43]  Kevin R. Thornton,et al.  Rapid divergence of gene duplicates on the Drosophila melanogaster X chromosome. , 2002, Molecular biology and evolution.

[44]  C. Antoniewski,et al.  Direct repeats bind the EcR/USP receptor and mediate ecdysteroid responses in Drosophila melanogaster , 1996, Molecular and cellular biology.

[45]  Saurabh Sinha,et al.  Sequence turnover and tandem repeats in cis-regulatory modules in drosophila. , 2005, Molecular biology and evolution.

[46]  Guang R. Gao,et al.  TROLL-Tandem Repeat Occurrence Locator , 2002, Bioinform..

[47]  Yusuke Nakamura,et al.  VNTR (variable number of tandem repeat) sequences as transcriptional, translational, or functional regulators , 1998, Journal of Human Genetics.

[48]  X. Cao,et al.  Tandem repeat of C/EBP binding sites mediates PPARγ2 gene transcription in glucocorticoid‐induced adipocyte differentiation , 2000, Journal of cellular biochemistry.

[49]  J. Mallet,et al.  A tetranucleotide polymorphic microsatellite, located in the first intron of the tyrosine hydroxylase gene, acts as a transcription regulatory element in vitro. , 1998, Human molecular genetics.

[50]  Maxine F. Singer,et al.  Genes and genomes , 1990 .

[51]  V R Chechetkin,et al.  Nucleosome units and hidden periodicities in DNA sequences. , 1998, Journal of biomolecular structure & dynamics.

[52]  L. Jin,et al.  Genetic variation at five trimeric and tetrameric tandem repeat loci in four human population groups. , 1992, Genomics.

[53]  M. Waterman,et al.  A method for fast database search for all k-nucleotide repeats. , 1994, Nucleic acids research.

[54]  E. Nevo,et al.  Microsatellites: genomic distribution, putative functions and mutational mechanisms: a review , 2002, Molecular ecology.

[55]  D. Tuan,et al.  A (GATA)7 motif located in the 5′ boundary area of the human β‐globin locus control region exhibits silencer activity in erythroid cells , 2000, American journal of hematology.

[56]  T. Ashizawa,et al.  An unstable triplet repeat in a gene related to myotonic muscular dystrophy. , 1992, Science.

[57]  Inna Dubchak,et al.  Comparative genome sequencing of Drosophila pseudoobscura: chromosomal, gene, and cis-element evolution. , 2005, Genome research.