Transposable Element Annotation in Completely Sequenced Eukaryote Genomes

With the development of new sequencing techniques, the number of sequenced plant genomes is increasing. However, accurate annotation of these sequences remains a major challenge, in particular with regard to transposable elements (TEs). The aim of this chapter is to provide a roadmap for researchers involved in genome projects to address this issue. We list several widely used tools for each step of the TE annotation process, from the identification of TE families to the annotation of TE copies. We assess the complementarities of these tools and suggest that combined approaches, using both de novo and knowledge-based TE detection methods, are likely to produce reasonably comprehensive and sensitive results. Nevertheless, existing approaches still need to be supplemented by expert manual curation. Hence, we describe good practice required for manual curation of TE consensus sequences.

[1]  Anna-Sophie Fiston-Lavier,et al.  A model of segmental duplication formation in Drosophila melanogaster. , 2007, Genome research.

[2]  Eugene W. Myers,et al.  PILER : identification and classification of genomic repeats , 2005 .

[3]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[4]  P. Hooykaas,et al.  An Arabidopsis hAT-like transposase is essential for plant development , 2005, Nature.

[5]  S. Eddy,et al.  Automated de novo identification of repeat sequence families in sequenced genomes. , 2002, Genome research.

[6]  X. Huang,et al.  On global sequence alignment , 1994, Comput. Appl. Biosci..

[7]  J. Bennetzen,et al.  A unified classification system for eukaryotic transposable elements , 2007, Nature Reviews Genetics.

[8]  Sean R. Eddy,et al.  Pack-MULE transposable elements mediate gene evolution in plants , 2004, Nature.

[9]  F. Zhou,et al.  MUST: a system for identification of miniature inverted-repeat transposable elements and applications to Anabaena variabilis and Haloquadratum walsbyi. , 2009, Gene.

[10]  E. Lerat Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs , 2010, Heredity.

[11]  Y. Gray,et al.  It takes two transposons to tango: transposable-element-mediated chromosomal rearrangements. , 2000, Trends in genetics : TIG.

[12]  Haixu Tang,et al.  MGEScan-non-LTR: computational identification and classification of autonomous non-LTR retrotransposons in eukaryotic genomes , 2009, Nucleic acids research.

[13]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[14]  A. Hikosaka,et al.  A systematic search and classification of T2 family miniature inverted-repeat transposable elements (MITEs) in Xenopus tropicalis suggests the existence of recently active MITE subfamilies , 2009, Molecular Genetics and Genomics.

[15]  Zhao Xu,et al.  LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons , 2007, Nucleic Acids Res..

[16]  H. Quesneville,et al.  Detection of New Transposable Element Families in Drosophila melanogaster and Anopheles gambiae Genomes , 2003, Journal of Molecular Evolution.

[17]  Christina A. Cuomo,et al.  Obligate biotrophy features unraveled by the genomic analysis of rust fungi , 2011, Proceedings of the National Academy of Sciences.

[18]  S Wright,et al.  Transposon diversity in Arabidopsis thaliana. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Graziano Pesole,et al.  Genome sequence of the metazoan plant-parasitic nematode Meloidogyne incognita , 2008, Nature Biotechnology.

[20]  Melanie A. Huntley,et al.  Evolution of genes and genomes on the Drosophila phylogeny , 2007, Nature.

[21]  Eugene W. Myers,et al.  Efficient q-Gram Filters for Finding All epsilon-Matches over a Given Length , 2005, RECOMB.

[22]  Z. Tu,et al.  Eight novel families of miniature inverted repeat transposable elements in the African malaria mosquito, Anopheles gambiae. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[23]  W. Lathe,et al.  Evolution of R1 and R2 in the rDNA units of the genus Drosophila , 2004, Genetica.

[24]  Dawn H. Nagel,et al.  The B73 Maize Genome: Complexity, Diversity, and Dynamics , 2009, Science.

[25]  Y. Van de Peer,et al.  The genome of Laccaria bicolor provides insights into mycorrhizal symbiosis , 2008, Nature.

[26]  Susan R. Wessler,et al.  MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences , 2010, Nucleic acids research.

[27]  Kim R. Rasmussen,et al.  Efficient q-Gram Filters for Finding All-Matches Over a Given Length , 2005 .

[28]  John F. McDonald,et al.  LTR_STRUC: a novel search and identification program for LTR retrotransposons , 2003, Bioinform..

[29]  Srinivas Aluru,et al.  Efficient algorithms and software for detection of full-length LTR retrotransposons , 2006, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[30]  Lior Pachter,et al.  Identification of transposable elements using multiple alignments of related genomes. , 2005, Genome research.

[31]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[32]  J. Jurka,et al.  RAG1 Core and V(D)J Recombination Signal Sequences Were Derived from Transib Transposons , 2005, PLoS biology.

[33]  György Abrusán,et al.  TEclass - a tool for automated classification of unknown eukaryotic transposable elements , 2009, Bioinform..

[34]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[35]  Jian Wang,et al.  ReAS: Recovery of Ancestral Sequences for Transposable Elements from the Unassembled Reads of a Whole Genome Shotgun , 2005, PLoS Comput. Biol..

[36]  S. Jackson,et al.  Doubling genome size without polyploidization: dynamics of retrotransposition-driven genomic expansions in Oryza australiensis, a wild relative of rice. , 2006, Genome research.

[37]  Bernard Henrissat,et al.  Genomic Analysis of the Necrotrophic Fungal Pathogens Sclerotinia sclerotiorum and Botrytis cinerea , 2011, PLoS genetics.

[38]  A PevznerPavel,et al.  De novo identification of repeat families in large genomes , 2005 .

[39]  T. Flutre,et al.  Considering Transposable Element Diversification in De Novo Annotation Approaches , 2011, PloS one.

[40]  Fred Dyda,et al.  Transposition of hAT elements links transposable elements and V(D)J recombination , 2004, Nature.

[41]  Gregory Kucherov,et al.  mreps: efficient and flexible detection of tandem repeats in DNA , 2003, Nucleic Acids Res..

[42]  S. Kurtz,et al.  A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes , 2008, BMC Genomics.

[43]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[44]  Stefan Kurtz,et al.  LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons , 2008, BMC Bioinformatics.

[45]  Corinne Da Silva,et al.  The Ectocarpus genome and the independent evolution of multicellularity in brown algae , 2010, Nature.

[46]  D. Petrov Evolution of genome size: new approaches to an old problem. , 2001, Trends in genetics : TIG.

[47]  Guillaume Bourque,et al.  Transposable elements in gene regulation and in the evolution of vertebrate genomes. , 2009, Current opinion in genetics & development.

[48]  Jonathan Perreault,et al.  RTAnalyzer: a web application for finding new retrotransposons and detecting L1 retrotransposition signatures , 2007, Nucleic Acids Res..

[49]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[50]  D. Finnegan,et al.  Eukaryotic transposable elements and genome evolution. , 1989, Trends in genetics : TIG.

[51]  V. Pereira Insertion bias and purifying selection of retrotransposons in the Arabidopsis thaliana genome , 2004, Genome Biology.

[52]  Wanjun Gu,et al.  Identification of repeat structure in large genomes using repeat probability clouds. , 2008, Analytical biochemistry.

[53]  Srinivas Aluru,et al.  Efficient Algorithms and Software for Detection of Full-Length LTR Retrotransposons , 2005, CSB.

[54]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[55]  Jef D Boeke,et al.  Molecular archeology of L1 insertions in the human genome , 2002, Genome Biology.

[56]  M. Lynch,et al.  De novo identification of LTR retrotransposons in eukaryotic genomes , 2007, BMC Genomics.

[57]  Guojun Yang,et al.  Bioinformatics and genomic analysis of transposable elements in eukaryotic genomes , 2011, Chromosome Research.

[58]  William Lee,et al.  Genome-tools: a flexible package for genome sequence analysis. , 2002, BioTechniques.

[59]  Guojun Yang,et al.  MAK, a computational tool kit for automated MITE analysis , 2003, Nucleic Acids Res..

[60]  Chunhong Mao,et al.  The Changing Tails of a Novel Short Interspersed Element in Aedes aegypti , 2004, Genetics.

[61]  M. Morgante,et al.  Gene duplication and exon shuffling by helitron-like transposons generate intraspecies diversity in maize , 2005, Nature Genetics.

[62]  V. Pereira Automated paleontology of repetitive DNA with REANNOTATE , 2008, BMC Genomics.

[63]  Casey M. Bergman,et al.  Combined Evidence Annotation of Transposable Elements in Genome Sequences , 2005, PLoS Comput. Biol..

[64]  Jason S. Caronna,et al.  Computational prediction and molecular confirmation of Helitron transposons in the maize genome , 2008, BMC Genomics.

[65]  Jean,et al.  Effector diversification within compartments of the Leptosphaeria maculans genome affected by Repeat-Induced Point mutations , 2011, Nature communications.

[66]  Christina A. Cuomo,et al.  Obligate Biotrophy Features Unraveled by the Genomic Analysis of the Rust Fungi, Melampsora larici-populina and Puccinia graminis f. sp. tritici , 2011 .

[67]  Elena R. Lozovsky,et al.  Patterns of insertion and deletion in contrasting chromatin domains. , 2002, Molecular biology and evolution.

[68]  Geoffrey J. Barton,et al.  The Jalview Java alignment editor , 2004, Bioinform..

[69]  Bernard Henrissat,et al.  Périgord black truffle genome uncovers evolutionary origins and mechanisms of symbiosis , 2010, Nature.

[70]  C. Feschotte Transposable elements and the evolution of regulatory networks , 2008, Nature Reviews Genetics.

[71]  Jerzy Jurka,et al.  Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor , 2006, BMC Bioinformatics.

[72]  J. Jurka,et al.  Repbase Update, a database of eukaryotic repetitive elements , 2005, Cytogenetic and Genome Research.

[73]  F. Crick,et al.  Selfish DNA: the ultimate parasite , 1980, Nature.

[74]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[75]  J. Casacuberta,et al.  Genome-wide analysis of the Emigrant family of MITEs of Arabidopsis thaliana. , 2002, Molecular biology and evolution.

[76]  Nicola Vitacolonna,et al.  Structured motifs search , 2004, J. Comput. Biol..

[77]  S. Kurtz,et al.  Fine-grained annotation and classification of de novo predicted LTR retrotransposons , 2009, Nucleic acids research.

[78]  Christina A. Cuomo,et al.  The Fusarium graminearum Genome Reveals a Link Between Localized Polymorphism and Pathogen Specialization , 2007, Science.

[79]  M. Low,et al.  Ancient Exaptation of a CORE-SINE Retroposon into a Highly Conserved Mammalian Neuronal Enhancer of the Proopiomelanocortin Gene , 2007, PLoS genetics.

[80]  Jerzy Jurka,et al.  Censor - a Program for Identification and Elimination of Repetitive Elements From DNA Sequences , 1996, Comput. Chem..

[81]  J. Bennetzen,et al.  Structure-based discovery and description of plant and animal Helitrons , 2009, Proceedings of the National Academy of Sciences.

[82]  Pari Skamnioti,et al.  Genome Expansion and Gene Loss in Powdery Mildew Fungi Reveal Tradeoffs in Extreme Parasitism , 2010, Science.

[83]  Casey M. Bergman,et al.  Discovering and detecting transposable elements in genome sequences , 2007, Briefings Bioinform..

[84]  Michael Ashburner,et al.  Recurrent insertion and duplication generate networks of transposable element sequences in the Drosophila melanogaster genome , 2006, Genome Biology.

[85]  Nirmal Ranganathan,et al.  Exploring Repetitive DNA Landscapes Using REPCLASS, a Tool That Automates the Classification of Transposable Elements in Eukaryotic Genomes , 2009, Genome biology and evolution.

[86]  Evgeny M. Zdobnov,et al.  Genome Sequence of Aedes aegypti, a Major Arbovirus Vector , 2007, Science.