An automated homology-based approach for identifying transposable elements

BackgroundTransposable elements (TEs) are mobile sequences found in nearly all eukaryotic genomes. They have the ability to move and replicate within a genome, often influencing genome evolution and gene expression. The identification of TEs is an important part of every genome project. The number of sequenced genomes is rapidly rising, and the need to identify TEs within them is also growing. The ability to do this automatically and effectively in a manner similar to the methods used for genes is of increasing importance. There exist many difficulties in identifying TEs, including their tendency to degrade over time and that many do not adhere to a conserved structure. In this work, we describe a homology-based approach for the automatic identification of high-quality consensus TEs, aimed for use in the analysis of newly sequenced genomes.ResultsWe describe a homology-based approach for the automatic identification of TEs in genomes. Our modular approach is dependent on a thorough and high-quality library of representative TEs. The implementation of the approach, named TESeeker, is BLAST-based, but also makes use of the CAP3 assembly program and the ClustalW2 multiple sequence alignment tool, as well as numerous BioPerl scripts. We apply our approach to newly sequenced genomes and successfully identify consensus TEs that are up to 99% identical to manually annotated TEs.ConclusionsWhile TEs are known to be a major force in the evolution of genomes, the automatic identification of TEs in genomes is far from mature. In particular, there is a lack of automated homology-based approaches that produce high-quality TEs. Our approach is able to generate high-quality consensus TE sequences automatically, requiring the user to only provide a few basic parameters. This approach is intentionally modular, allowing researchers to use components separately or iteratively. Our approach is most effective for TEs with intact reading frames. The implementation, TESeeker, is available for download as a virtual appliance, while the library of representative TEs is available as a separate download.

[1]  G. Stoesser NCBI (National Center for Biotechnology Information) , 2004 .

[2]  C. Sim,et al.  Molecular evolutionary analysis of the widespread piggyBac transposon family and related "domesticated" sequences , 2003, Molecular Genetics and Genomics.

[3]  John F. McDonald,et al.  LTR_STRUC: a novel search and identification program for LTR retrotransposons , 2003, Bioinform..

[4]  S. Eddy,et al.  Automated de novo identification of repeat sequence families in sequenced genomes. , 2002, Genome research.

[5]  Lior Pachter,et al.  Identification of transposable elements using multiple alignments of related genomes. , 2005, Genome research.

[6]  J. Biedler,et al.  Non-LTR retrotransposons in the African malaria mosquito, Anopheles gambiae: unprecedented diversity and evidence of recent activity. , 2003, Molecular biology and evolution.

[7]  Casey M. Bergman,et al.  Discovering and detecting transposable elements in genome sequences , 2007, Briefings Bioinform..

[8]  J. Shapiro THE DISCOVERY AND SIGNIFICANCE OF MOBILE GENETIC ELEMENTS , 1995 .

[9]  Circe W. Tsui,et al.  Natural Genetic Variation Caused by Transposable Elements in Humans , 2004, Genetics.

[10]  J. Johnston,et al.  Sequencing of a New Target Genome: the Pediculus humanus humanus (Phthiraptera: Pediculidae) Genome Project , 2006, Journal of medical entomology.

[11]  E. Lerat Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs , 2010, Heredity.

[12]  Casey M. Bergman,et al.  Combined Evidence Annotation of Transposable Elements in Genome Sequences , 2005, PLoS Comput. Biol..

[13]  Jian Wang,et al.  The Genome Sequence of the Malaria Mosquito Anopheles gambiae , 2002, Science.

[14]  Hadi Quesneville,et al.  Detection of transposable elements by their compositional bias , 2004, BMC Bioinformatics.

[15]  H. Quesneville,et al.  P elements and MITE relatives in the whole genome sequence of Anopheles gambiae , 2006, BMC Genomics.

[16]  J. Silva,et al.  Analyses of P‐like transposable element sequences from the genome of Anopheles gambiae , 2004, Insect molecular biology.

[17]  Evgeny M. Zdobnov,et al.  Genome sequences of the human body louse and its primary endosymbiont provide insights into the permanent parasitic lifestyle , 2010, Proceedings of the National Academy of Sciences.

[18]  J. Jurka,et al.  Repbase Update, a database of eukaryotic repetitive elements , 2005, Cytogenetic and Genome Research.

[19]  P elements are found in the genomes of nematoceran insects of the genus Anopheles , 2003 .

[20]  S. Koren,et al.  Assembly algorithms for next-generation sequencing data. , 2010, Genomics.

[21]  Gregory R. Madey,et al.  VectorBase: a data resource for invertebrate vector genomics , 2008, Nucleic Acids Res..

[22]  M. G. Kidwell,et al.  Transposable elements as sources of variation in animals and plants. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[23]  D. Lipman,et al.  National Center for Biotechnology Information , 2019, Springer Reference Medizin.

[24]  Ryan Edward Butler The Design and Development of Vectorbase: A Bioinformatic Resource Center for Invertebrate Vectors of Human Pathogens , 2010 .

[25]  C. A. Dunn,et al.  Impact of transposable elements on the evolution of mammalian gene regulation , 2005, Cytogenetic and Genome Research.

[26]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[27]  Evgeny M. Zdobnov,et al.  Genome Sequence of Aedes aegypti, a Major Arbovirus Vector , 2007, Science.

[28]  S. Bridges,et al.  Computational Approaches and Tools Used in Identification of Dispersed Repetitive DNA Sequences , 2008, Tropical Plant Biology.

[29]  C. Mouchès,et al.  Mosquito transposable elements , 2005, Genetica.

[30]  Evgeny M. Zdobnov,et al.  VectorBase: a home for invertebrate vectors of human pathogens , 2006, Nucleic Acids Res..

[31]  Alan M. Lambowitz,et al.  Mobile DNA III , 2002 .

[32]  H. Quesneville,et al.  Detection of New Transposable Element Families in Drosophila melanogaster and Anopheles gambiae Genomes , 2003, Journal of Molecular Evolution.

[33]  R. Martienssen,et al.  Transposable elements and the epigenetic regulation of the genome , 2007, Nature Reviews Genetics.

[34]  Russell B. Fletcher,et al.  The Genome of the Western Clawed Frog Xenopus tropicalis , 2010, Science.

[35]  Mihai Pop,et al.  Genome Sequence Assembly: Algorithms and Issues , 2002, Computer.

[36]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[37]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[38]  Claire Fraser-Liggett,et al.  Sequencing of Culex quinquefasciatus Establishes a Platform for Mosquito Comparative Genomics , 2010, Science.

[39]  Eugene W. Myers,et al.  PILER: identification and classification of genomic repeats , 2005, ISMB.

[40]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[41]  Nina V. Fedoroff,et al.  The discovery and characterization of transposable elements. The collected papers of Barbara McClintock New York: Garland Publishing, Inc. (1987). 636 pp. $75.00 , 1988, Cell.