Browsing repeats in genomes: Pygram and an application to non-coding region analysis

BackgroundA large number of studies on genome sequences have revealed the major role played by repeated sequences in the structure, function, dynamics and evolution of genomes. In-depth repeat analysis requires specialized methods, including visualization techniques, to achieve optimum exploratory power.ResultsThis article presents Pygram, a new visualization application for investigating the organization of repeated sequences in complete genome sequences. The application projects data from a repeat index file on the analysed sequences, and by combining this principle with a query system, is capable of locating repeated sequences with specific properties. In short, Pygram provides an efficient, graphical browser for studying repeats. Implementation of the complete configuration is illustrated in an analysis of CRISPR structures in Archaea genomes and the detection of horizontal transfer between Archaea and Viruses.ConclusionBy proposing a new visualization environment to analyse repeated sequences, this application aims to increase the efficiency of laboratories involved in investigating repeat organization in single genomes or across several genomes.

[1]  Margaret Staton,et al.  CMD: a Cotton Microsatellite Database resource for Gossypium genomics , 2006, BMC Genomics.

[2]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[3]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[4]  James A. M. McHugh,et al.  A first approach to finding common motifs with gaps , 2004, Int. J. Found. Comput. Sci..

[5]  Gregory Kucherov,et al.  Finding maximal repetitions in a word in linear time , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[6]  Mireille Régnier,et al.  Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression , 2006, Bioinform..

[7]  G Achaz,et al.  Analysis of intrachromosomal duplications in yeast Saccharomyces cerevisiae: a possible model for their origin. , 2000, Molecular biology and evolution.

[8]  G Vergnaud,et al.  Further evidence for elevated human minisatellite mutation rate in Belarus eight years after the Chernobyl accident. , 1997, Mutation research.

[9]  Dan Geiger,et al.  Finding approximate tandem repeats in genomic sequences , 2004, RECOMB.

[10]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[11]  Arnold L. Rosenberg,et al.  Rapid identification of repeated patterns in strings, trees and arrays , 1972, STOC.

[12]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[13]  A. Gibbs,et al.  The diagram, a method for comparing sequences. Its use with amino acid and nucleotide sequences. , 1970, European journal of biochemistry.

[14]  Eric Coissac,et al.  Origin and fate of repeats in bacteria , 2002, Nucleic Acids Res..

[15]  J. García-Martínez,et al.  Intervening Sequences of Regularly Spaced Prokaryotic Repeats Derive from Foreign Genetic Elements , 2005, Journal of Molecular Evolution.

[16]  Kim Brügger,et al.  Archaeal integrative genetic elements and their impact on genome evolution. , 2002, Research in microbiology.

[17]  Eugene W. Myers,et al.  PILER: identification and classification of genomic repeats , 2005, ISMB.

[18]  J. Jurka,et al.  Repeats in genomic DNA: mining and meaning. , 1998, Current opinion in structural biology.

[19]  M A Ferguson-Smith,et al.  Sequence variation and size ranges of CAG repeats in the Machado-Joseph disease, spinocerebellar ataxia type 1 and androgen receptor genes. , 1995, Human molecular genetics.

[20]  A. Meyer,et al.  Genome duplication, a trait shared by 22000 species of ray-finned fish. , 2003, Genome research.

[21]  David Haussler,et al.  Sequence landscapes , 1986, Nucleic Acids Res..

[22]  Jens Stoye,et al.  Simple and flexible detection of contiguous repeats using a suffix tree , 2002, Theor. Comput. Sci..

[23]  Rachael Brady,et al.  BARD: a visualization tool for biological sequence analysis , 2003, IEEE Symposium on Information Visualization 2003 (IEEE Cat. No.03TH8714).

[24]  Wolfgang Stephan,et al.  The evolutionary dynamics of repetitive DNA in eukaryotes , 1994, Nature.

[25]  J. Jurka,et al.  Repbase Update, a database of eukaryotic repetitive elements , 2005, Cytogenetic and Genome Research.

[26]  H. Kazazian Mobile Elements: Drivers of Genome Evolution , 2004, Science.

[27]  Maxime Crochemore,et al.  Longest Repeats with a Block of Don't Cares , 2004, LATIN.

[28]  L. Schouls,et al.  Identification of genes that are associated with DNA repeats in prokaryotes , 2002, Molecular microbiology.

[29]  D. Haussler,et al.  A distal enhancer and an ultraconserved exon are derived from a novel retroposon , 2006, Nature.

[30]  Arnaud Lefebvre,et al.  An Improved Algorithm for Finding Longest Repeats with a Modified Factor Oracle , 2003, J. Autom. Lang. Comb..

[31]  R. Gibbs,et al.  PipMaker--a web server for aligning two genomic DNA sequences. , 2000, Genome research.

[32]  A. Hughes,et al.  Gene duplication and the structure of eukaryotic genomes. , 2001, Genome research.

[33]  B. Dujon,et al.  Genome evolution in yeasts , 2004, Nature.

[34]  Z. D. Blount,et al.  New insertion sequences of Sulfolobus: functional properties and implications for genome evolution in hyperthermophilic archaea , 2004, Molecular microbiology.

[35]  J. Stoye,et al.  REPuter: the manifold applications of repeat analysis on a genomic scale. , 2001, Nucleic acids research.

[36]  Stefan Kurtz,et al.  REPuter: fast computation of maximal repeats in complete genomes , 1999, Bioinform..

[37]  Jens Stoye,et al.  Finding Maximal Pairs with Bounded Gap , 1999, CPM.

[38]  John M. Butler,et al.  STRBase: a short tandem repeat DNA database for the human identity testing community , 2001, Nucleic Acids Res..

[39]  Eugene W. Myers,et al.  Identifying Satellites and Periodic Repetitions in Biological Sequences , 1998, J. Comput. Biol..

[40]  G. Vergnaud,et al.  CRISPR elements in Yersinia pestis acquire new repeats by preferential uptake of bacteriophage DNA, and provide additional tools for evolutionary studies. , 2005, Microbiology.

[41]  A. Gibbs,et al.  The Diagram, a Method for Comparing Sequences , 1970 .

[42]  김동규,et al.  [서평]「Algorithms on Strings, Trees, and Sequences」 , 2000 .

[43]  Nicola Vitacolonna,et al.  Structured motifs search , 2004, J. Comput. Biol..

[44]  D. Haussler,et al.  Ultraconserved Elements in the Human Genome , 2004, Science.