ModuleOrganizer: detecting modules in families of transposable elements

BackgroundMost known eukaryotic genomes contain mobile copied elements called transposable elements. In some species, these elements account for the majority of the genome sequence. They have been subject to many mutations and other genomic events (copies, deletions, captures) during transposition. The identification of these transformations remains a difficult issue. The study of families of transposable elements is generally founded on a multiple alignment of their sequences, a critical step that is adapted to transposons containing mostly localized nucleotide mutations. Many transposons that have lost their protein-coding capacity have undergone more complex rearrangements, needing the development of more complex methods in order to characterize the architecture of sequence variations.ResultsIn this study, we introduce the concept of a transposable element module, a flexible motif present in at least two sequences of a family of transposable elements and built on a succession of maximal repeats. The paper proposes an assembly method working on a set of exact maximal repeats of a set of sequences to create such modules. It results in a graphical view of sequences segmented into modules, a representation that allows a flexible analysis of the transformations that have occurred between them. We have chosen as a demonstration data set in depth analysis of the transposable element Foldback in Drosophila melanogaster. Comparison with multiple alignment methods shows that our method is more sensitive for highly variable sequences. The study of this family and the two other families AtREP21 and SIDER2 reveals new copies of very different sizes and various combinations of modules which show the potential of our method.ConclusionsModuleOrganizer is available on the Genouest bioinformatics center at http://moduleorganizer.genouest.org

[1]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[2]  C. Feschotte,et al.  DNA transposons and the evolution of eukaryotic genomes. , 2007, Annual review of genetics.

[3]  Jacques Nicolas,et al.  Suffix-tree analyser (STAN): looking for nucleotidic and peptidic patterns in chromosomes , 2005, Bioinform..

[4]  B. Meyers,et al.  The Functional Role of Pack-MULEs in Rice Inferred from Purifying Selection and Expression Profile[W] , 2009, The Plant Cell Online.

[5]  Nicola Vitacolonna,et al.  Structured motifs search , 2004, J. Comput. Biol..

[6]  Marie-France Sagot,et al.  RISOTTO: Fast Extraction of Motifs with Mismatches , 2006, LATIN.

[7]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[8]  J. Collado-Vides,et al.  Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. , 2000, Nucleic acids research.

[9]  S. Wessler,et al.  LTR-retrotransposons and MITEs: important players in the evolution of plant genomes. , 1995, Current opinion in genetics & development.

[10]  David A. Nix,et al.  GATA: a graphic alignment tool for comparative sequence analysis , 2005, BMC Bioinformatics.

[11]  Yongqiang Zhang,et al.  SMOTIF: efficient structured pattern and profile motif search , 2006, Algorithms for Molecular Biology.

[12]  C. Waddell,et al.  FARE, a new family of foldback transposons in Arabidopsis. , 2000, Genetics.

[13]  W. A. Silva,et al.  The contribution of transposable elements to Bos taurus gene structure. , 2007, Gene.

[14]  H. Dooner,et al.  Give-and-take: interactions between DNA transposons and their host plant genomes. , 2007, Current opinion in genetics & development.

[15]  Alan M. Lambowitz,et al.  Mobile DNA III , 2002 .

[16]  M. Batzer,et al.  Birth of a chimeric primate gene by capture of the transposase gene from a mobile element. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[17]  F. Bringaud,et al.  Organization and evolution of two SIDER retroposon subfamilies and their impact on the Leishmania genome , 2009, BMC Genomics.

[18]  J. Jurka,et al.  Repbase Update, a database of eukaryotic repetitive elements , 2005, Cytogenetic and Genome Research.

[19]  Y. Bigot,et al.  Conservation of Palindromic and Mirror Motifs within Inverted Terminal Repeats of mariner-like Elements. , 2005, Journal of molecular biology.

[20]  H. Quesneville,et al.  Recurrent recruitment of the THAP DNA-binding domain and molecular domestication of the P-transposable element. , 2005, Molecular biology and evolution.

[21]  G. Mehldau,et al.  A system for pattern matching applications on biosequences , 1993, Comput. Appl. Biosci..

[22]  Nematollaah Shiri,et al.  Fast Structured Motif Search in DNA Sequences , 2008, BIRD.

[23]  Lixing Yang,et al.  Distribution, diversity, evolution, and survival of Helitrons in the maize genome , 2009, Proceedings of the National Academy of Sciences.

[24]  Hatem Zayed,et al.  The Sleeping Beauty transposable element: evolution, regulation and genetic applications. , 2004, Current issues in molecular biology.

[25]  S. Potter,et al.  DNA sequence of a foldback transposable element in Drosophila , 1982, Nature.

[26]  J. E. Peters,et al.  Tn7 elements: engendering diversity from chromosomes to episomes. , 2009, Plasmid.

[27]  C. Feschotte,et al.  Evidence that a family of miniature inverted-repeat transposable elements (MITEs) from the Arabidopsis thaliana genome has arisen from a pogo-like DNA transposon. , 2000, Molecular biology and evolution.

[28]  John Riedl,et al.  Generalized suffix trees for biological sequence data: applications and implementation , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[29]  Inna Dubchak,et al.  Multiple whole genome alignments and novel biomedical applications at the VISTA portal , 2007, Nucleic Acids Res..

[30]  S. Eddy,et al.  Automated de novo identification of repeat sequence families in sequenced genomes. , 2002, Genome research.

[31]  J. Jurka,et al.  Helitrons on a roll: eukaryotic rolling-circle transposons. , 2007, Trends in genetics : TIG.

[32]  Pavel A. Pevzner,et al.  De novo identification of repeat families in large genomes , 2005, ISMB.

[33]  Yuzhuo Wang,et al.  A Novel Protein Isoform of the Multicopy Human NAIP Gene Derives from Intragenic Alu SINE Promoters , 2009, PloS one.

[34]  Guojun Yang,et al.  Transposition of the rice miniature inverted repeat transposable element mPing in Arabidopsis thaliana , 2007, Proceedings of the National Academy of Sciences.

[35]  Dominique Lavenier,et al.  Domain organization within repeated DNA sequences: application to the study of a family of transposable elements , 2006, Bioinform..

[36]  Marie-France Sagot,et al.  Algorithms for Extracting Structured Motifs Using a Suffix Tree with an Application to Promoter and Regulatory Site Consensus Identification , 2000, J. Comput. Biol..

[37]  Guojun Yang,et al.  Tuned for Transposition: Molecular Determinants Underlying the Hyperactivity of a Stowaway MITE , 2009, Science.

[38]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[39]  Kyoung-Hee Choi,et al.  Applications of transposon-based gene delivery system in bacteria. , 2009, Journal of microbiology and biotechnology.

[40]  M. G. Kidwell,et al.  Transposable elements and host genome evolution. , 2000, Trends in ecology & evolution.

[41]  Thomas L. Madden,et al.  BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. , 1999, FEMS microbiology letters.