Analysis of transposable element sequences using CENSOR and RepeatMasker.

Eukaryotic genomes are full of repetitive DNA, transposable elements (TEs) in particular, and accordingly there are a number of computational methods that can be used to identify TEs from genomic sequences. We present here a survey of two of the most readily available and widely used bioinformatics applications for the detection, characterization, and analysis of TE sequences in eukaryotic genomes: CENSOR and RepeatMasker. For each program, information on availability, input, output, and the algorithmic methods used is provided. Specific examples of the use of CENSOR and RepeatMasker are also described. CENSOR and RepeatMasker both rely on homology-based methods for the detection of TE sequences. There are several other classes of methods available for the analysis of repetitive DNA sequences including de novo methods that compare genomic sequences against themselves, class-specific methods that use structural characteristics of specific classes of elements to aid in their identification, and pipeline methods that combine aspects of some or all of the aforementioned methods. We briefly consider the strengths and weaknesses of these different classes of methods with an emphasis on their complementary utility for the analysis of repetitive DNA in eukaryotes.

[1]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[2]  S. Eddy,et al.  Automated de novo identification of repeat sequence families in sequenced genomes. , 2002, Genome research.

[3]  Aleksandar Milosavljevic,et al.  Reconstruction and analysis of human alu genes , 1991, Journal of Molecular Evolution.

[4]  J. Jurka,et al.  Repbase Update, a database of eukaryotic repetitive elements , 2005, Cytogenetic and Genome Research.

[5]  John F. McDonald,et al.  LTR_STRUC: a novel search and identification program for LTR retrotransposons , 2003, Bioinform..

[6]  R. Britten,et al.  Repeated sequences in DNA. Hundreds of thousands of copies of DNA sequences have been incorporated into the genomes of higher organisms. , 1968, Science.

[7]  M. Lynch,et al.  De novo identification of LTR retrotransposons in eukaryotic genomes , 2007, BMC Genomics.

[8]  R. Britten,et al.  Repeated Sequences in DNA , 1968 .

[9]  Ian Korf,et al.  MaskerAid : a performance enhancement to RepeatMasker , 2000, Bioinform..

[10]  Guojun Yang,et al.  MAK, a computational tool kit for automated MITE analysis , 2003, Nucleic Acids Res..

[11]  Alejandro A. Schäffer,et al.  WindowMasker: window-based masker for sequenced genomes , 2006, Bioinform..

[12]  Aleksandar Milosavljevic,et al.  Prototypic sequences for human repetitive DNA , 1992, Journal of Molecular Evolution.

[13]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[14]  Jerzy Jurka,et al.  Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor , 2006, BMC Bioinformatics.

[15]  J. McDonald,et al.  Long terminal repeat retrotransposons of Oryza sativa , 2002, Genome Biology.

[16]  Aleksandar Milosavljevic,et al.  Discovering simple DNA sequences by the algorithmic significance method , 1993, Comput. Appl. Biosci..

[17]  Jerzy Jurka,et al.  Censor - a Program for Identification and Elimination of Repetitive Elements From DNA Sequences , 1996, Computers and Chemistry.

[18]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[19]  J. Jurka Repbase update: a database and an electronic journal of repetitive elements. , 2000, Trends in genetics : TIG.

[20]  Casey M. Bergman,et al.  Combined Evidence Annotation of Transposable Elements in Genome Sequences , 2005, PLoS Comput. Biol..