MGEScan-non-LTR: computational identification and classification of autonomous non-LTR retrotransposons in eukaryotic genomes

Computational methods for genome-wide identification of mobile genetic elements (MGEs) have become increasingly necessary for both genome annotation and evolutionary studies. Non-long terminal repeat (non-LTR) retrotransposons are a class of MGEs that have been found in most eukaryotic genomes, sometimes in extremely high numbers. In this article, we present a computational tool, MGEScan-non-LTR, for the identification of non-LTR retrotransposons in genomic sequences, following a computational approach inspired by a generalized hidden Markov model (GHMM). Three different states represent two different protein domains and inter-domain linker regions encoded in the non-LTR retrotransposons, and their scores are evaluated by using profile hidden Markov models (for protein domains) and Gaussian Bayes classifiers (for linker regions), respectively. In order to classify the non-LTR retrotransposons into one of the 12 previously characterized clades using the same model, we defined separate states for different clades. MGEScan-non-LTR was tested on the genome sequences of four eukaryotic organisms, Drosophila melanogaster, Daphnia pulex, Ciona intestinalis and Strongylocentrotus purpuratus. For the D. melanogaster genome, MGEScan-non-LTR found all known ‘full-length’ elements and simultaneously classified them into the clades CR1, I, Jockey, LOA and R1. Notably, for the D. pulex genome, in which no non-LTR retrotransposon has been annotated, MGEScan-non-LTR found a significantly larger number of elements than did RepeatMasker, using the current version of the RepBase Update library. We also identified novel elements in the other two genomes, which have only been partially studied for non-LTR retrotransposons.

[1]  D. Blesa,et al.  Distribution of the bilbo non-LTR retrotransposon in Drosophilidae and its evolution in the Drosophila obscura species group. , 2001, Molecular biology and evolution.

[2]  J. Jurka,et al.  Repbase Update, a database of eukaryotic repetitive elements , 2005, Cytogenetic and Genome Research.

[3]  T. Eickbush,et al.  The domain structure and retrotransposition mechanism of R2 elements are conserved throughout arthropods. , 1999, Molecular biology and evolution.

[4]  Eugene W. Myers,et al.  PILER: identification and classification of genomic repeats , 2005, ISMB.

[5]  R. Albalat,et al.  The non-LTR retrotransposons in Ciona intestinalis: new insights into the evolution of chordate genomes , 2003, Genome Biology.

[6]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[7]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[8]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[9]  Eugene Berezikov,et al.  A search for reverse transcriptase-coding sequences reveals new non-LTR retrotransposons in the genome of Drosophila melanogaster , 2000, Genome Biology.

[10]  V. Fet,et al.  Non-LTR retrotransposons in fungi , 2009, Functional & Integrative Genomics.

[11]  Stephen H. White,et al.  Experimentally determined hydrophobicity scale for proteins at membrane interfaces , 1996, Nature Structural Biology.

[12]  D. Blesa,et al.  bilbo, a non-LTR retrotransposon of Drosophila subobscura: a clue to the evolution of LINE-like elements in Drosophila. , 1997, Molecular biology and evolution.

[13]  Dan Nettleton,et al.  Genomic neighborhoods for Arabidopsis retrotransposons: a role for targeted integration in the distribution of the Metaviridae , 2004, Genome Biology.

[14]  T. Eickbush,et al.  Ancient lineages of non-LTR retrotransposons in the primitive eukaryote, Giardia lamblia. , 2002, Molecular biology and evolution.

[15]  John F. McDonald,et al.  LTR_STRUC: a novel search and identification program for LTR retrotransposons , 2003, Bioinform..

[16]  T. Eickbush,et al.  Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: A mechanism for non-LTR retrotransposition , 1993, Cell.

[17]  F. Gubensek,et al.  Evolutionary dynamics in a novel L2 clade of non-LTR retrotransposons in Deuterostomia. , 2001, Molecular biology and evolution.

[18]  S. Eddy,et al.  Automated de novo identification of repeat sequence families in sequenced genomes. , 2002, Genome research.

[19]  J. Biedler,et al.  Non-LTR retrotransposons in the African malaria mosquito, Anopheles gambiae: unprecedented diversity and evidence of recent activity. , 2003, Molecular biology and evolution.

[20]  L. Robertson,et al.  Evidence of multiple retrotransposons in two litopenaeid species. , 2008, Animal genetics.

[21]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[22]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[23]  Paul Richardson,et al.  The Draft Genome of Ciona intestinalis: Insights into Chordate and Vertebrate Origins , 2002, Science.

[24]  T. Eickbush,et al.  The age and evolution of non-LTR retrotransposable elements. , 1999, Molecular biology and evolution.

[25]  J. Blomberg,et al.  Automated recognition of retroviral sequences in genomic data—RetroTector© , 2007, Nucleic acids research.

[26]  G. Heijne,et al.  Recognition of transmembrane helices by the endoplasmic reticulum translocon , 2005, Nature.

[27]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[28]  B. Strandberg,et al.  2.2 A resolution structure of the amino-terminal half of HIV-1 reverse transcriptase (fingers and palm subdomains). , 1994, Structure.

[29]  J. Volff,et al.  Multiple lineages of the non-LTR retrotransposon Rex1 with varying success in invading fish genomes. , 2000, Molecular biology and evolution.

[30]  K. Kojima,et al.  Cross-genome screening of novel sequence-specific non-LTR retrotransposons: various multicopy RNA genes and microsatellites are selected as targets. , 2003, Molecular biology and evolution.

[31]  T. Eickbush,et al.  NeSL-1, an ancient lineage of site-specific non-LTR retrotransposons from Caenorhabditis elegans. , 2000, Genetics.

[32]  M. Ashburner,et al.  The transposable elements of the Drosophila melanogaster euchromatin: a genomics perspective , 2002, Genome Biology.

[33]  Crystal M. Hepp,et al.  Automated characterization of potentially active retroid agents in the human genome. , 2005, Genomics.

[34]  M. Lynch,et al.  De novo identification of LTR retrotransposons in eukaryotic genomes , 2007, BMC Genomics.

[35]  T. Eickbush,et al.  R5 retrotransposons insert into a family of infrequently transcribed 28S rRNA genes of planaria. , 2003, Molecular biology and evolution.

[36]  Z. Tu,et al.  Structural, genomic, and phylogenetic analysis of Lian, a novel family of non-LTR retrotransposons in the yellow fever mosquito, Aedes aegypti. , 1998, Molecular biology and evolution.

[37]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[38]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[39]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[40]  Andrew R. Jackson,et al.  The Genome of the Sea Urchin Strongylocentrotus purpuratus , 2006, Science.

[41]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.