Efficient Algorithms and Software for Detection of Full-Length LTR Retrotransposons

LTR retrotransposons constitute one of the most abundant classes of repetitive elements in eukaryotic genomes. In this paper, we present a new algorithm for detection of full-length LTR retrotransposons in genomic sequences. The algorithm identifies regions in a genomic sequence that show structural characteristics of LTR retrotransposons. Three key components distinguish our algorithm from that of current software--(i) a novel method that preprocesses the entire genomic sequence in linear time and produces high quality pairs of LTR candidates in run-time that is constant per pair, (ii) a thorough alignment-based evaluation of candidate pairs to ensure high quality prediction, and (iii) a robust parameter set encompassing both structural constraints and quality controls providing users with a high degree of flexibility. We implemented our algorithm into a software program called LTR_par, which can be run on both serial and parallel computers. Validation of our software against the yeast genome indicates superior results in both quality and performance when compared to existing software. Additional validations are presented on rice BACs and chimpanzee genome.

[1]  A. Smit Interspersed repeats and other mementos of transposable elements in mammalian genomes. , 1999, Current opinion in genetics & development.

[2]  E. Ganko,et al.  Retrotransposon-gene associations are widespread among D. melanogaster populations. , 2004, Molecular biology and evolution.

[3]  Stefan Kurtz,et al.  REPuter: fast computation of maximal repeats in complete genomes , 1999, Bioinform..

[4]  Vikram Bhattacharjee,et al.  Evidence for the contribution of LTR retrotransposons to C. elegans gene evolution. , 2003, Molecular biology and evolution.

[5]  J. Stoye,et al.  REPuter: the manifold applications of repeat analysis on a genomic scale. , 2001, Nucleic acids research.

[6]  John F. McDonald,et al.  LTR_STRUC: a novel search and identification program for LTR retrotransposons , 2003, Bioinform..

[7]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[8]  J. McDonald,et al.  Long terminal repeat retrotransposons of Oryza sativa , 2002, Genome Biology.

[9]  Dan Nettleton,et al.  Genomic neighborhoods for Arabidopsis retrotransposons: a role for targeted integration in the distribution of the Metaviridae , 2004, Genome Biology.

[10]  Phillip SanMiguel,et al.  The paleontology of intergene retrotransposons of maize , 1998, Nature Genetics.

[11]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[12]  D. Voytas,et al.  Transposable elements and genome organization: a comprehensive survey of retrotransposons revealed by the complete Saccharomyces cerevisiae genome sequence. , 1998, Genome research.

[13]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[14]  Srinivas Aluru,et al.  Space efficient linear time construction of suffix arrays , 2003, J. Discrete Algorithms.

[15]  R. Flavell,et al.  Repetitive DNA and chromosome evolution in plants. , 1986, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[16]  S. Eddy,et al.  Automated de novo identification of repeat sequence families in sequenced genomes. , 2002, Genome research.

[17]  Wolfgang Stephan,et al.  The evolutionary dynamics of repetitive DNA in eukaryotes , 1994, Nature.

[18]  S. Jackson,et al.  Retrotransposon-related DNA sequences in the centromeres of grass chromosomes. , 1998, Genetics.

[19]  F. Bushman Targeting Survival Integration Site Selection by Retroviruses and LTR-Retrotransposons , 2003, Cell.

[20]  Hiroki Arimura,et al.  Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications , 2001, CPM.

[21]  Dong Kyue Kim,et al.  Linear-Time Construction of Suffix Arrays , 2003, CPM.

[22]  Peter Sanders,et al.  Simple Linear Work Suffix Array Construction , 2003, ICALP.

[23]  J. Bennetzen,et al.  Nested Retrotransposons in the Intergenic Regions of the Maize Genome , 1996, Science.

[24]  J. McDonald,et al.  Long terminal repeat retrotransposons of Mus musculus , 2004, Genome Biology.

[25]  T. Eickbush,et al.  Origin and evolution of retroelements based upon their reverse transcriptase sequences. , 1990, The EMBO journal.