Accurate Transposable Element Annotation Is Vital When Analyzing New Genome Assemblies

Transposable elements (TEs) are mobile genetic elements with the ability to replicate themselves throughout the host genome. In some taxa TEs reach copy numbers in hundreds of thousands and can occupy more than half of the genome. The increasing number of reference genomes from nonmodel species has begun to outpace efforts to identify and annotate TE content and methods that are used vary significantly between projects. Here, we demonstrate variation that arises in TE annotations when less than optimal methods are used. We found that across a variety of taxa, the ability to accurately identify TEs based solely on homology decreased as the phylogenetic distance between the queried genome and a reference increased. Next we annotated repeats using homology alone, as is often the case in new genome analyses, and a combination of homology and de novo methods as well as an additional manual curation step. Reannotation using these methods identified a substantial number of new TE subfamilies in previously characterized genomes, recognized a higher proportion of the genome as repetitive, and decreased the average genetic distance within TE families, implying recent TE accumulation. Finally, these finding—increased recognition of younger TEs—were confirmed via an analysis of the postman butterfly (Heliconius melpomene). These observations imply that complete TE annotation relies on a combination of homology and de novo–based repeat identification, manual curation, and classification and that relying on simple, homology-based methods is insufficient to accurately describe the TE landscape of a newly sequenced genome.

[1]  Durrell D. Kapan,et al.  Highly conserved gene order and numerous novel repetitive elements in genomic regions linked to wing pattern variation in Heliconius butterflies , 2008, BMC Genomics.

[2]  Nirmal Ranganathan,et al.  Exploring Repetitive DNA Landscapes Using REPCLASS, a Tool That Automates the Classification of Transposable Elements in Eukaryotic Genomes , 2009, Genome biology and evolution.

[3]  Alan Hodgkinson,et al.  Variation in the mutation rate across mammalian genomes , 2011, Nature Reviews Genetics.

[4]  György Abrusán,et al.  TEclass - a tool for automated classification of unknown eukaryotic transposable elements , 2009, Bioinform..

[5]  Meganathan P. Ramakodi,et al.  Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs , 2014, Science.

[6]  Pavel A. Pevzner,et al.  De novo identification of repeat families in large genomes , 2005, ISMB.

[7]  D. Ray,et al.  Large numbers of novel miRNAs originate from DNA transposons and are coincident with a large species radiation in bats. , 2014, Molecular biology and evolution.

[8]  M. Miles,et al.  An insect molecular clock dates the origin of the insects and accords with palaeontological and biogeographic landmarks. , 2002, Molecular biology and evolution.

[9]  H. Kazazian Mobile Elements: Drivers of Genome Evolution , 2004, Science.

[10]  Liane Gagnier,et al.  Genomic deletions and precise removal of transposable elements mediated by short identical DNA segments in primates. , 2005, Genome research.

[11]  H. Wichman,et al.  Loss of LINE-1 Activity in the Megabats , 2008, Genetics.

[12]  Cédric Feschotte,et al.  Massive amplification of rolling-circle transposons in the lineage of the bat Myotis lucifugus , 2007, Proceedings of the National Academy of Sciences.

[13]  J. Bennetzen,et al.  A unified classification system for eukaryotic transposable elements , 2007, Nature Reviews Genetics.

[14]  J. Jurka,et al.  Molecular paleontology of transposable elements in the Drosophila melanogaster genome , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Loretta Auvil,et al.  Draft genome sequence of the Tibetan antelope , 2013, Nature Communications.

[16]  Webb Miller,et al.  Using genomic data to unravel the root of the placental mammal phylogeny. , 2007, Genome research.

[17]  Hadi Quesneville,et al.  Transposable Element Annotation in Completely Sequenced Eukaryote Genomes , 2012 .

[18]  B. Gaut,et al.  A triptych of the evolution of plant transposable elements. , 2010, Trends in plant science.

[19]  Zhao Xu,et al.  LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons , 2007, Nucleic Acids Res..

[20]  Vladimir V. Kapitonov,et al.  Molecular paleontology of transposable elements from Arabidopsis thaliana , 2004, Genetica.

[21]  Travis J. Wheeler,et al.  A call for benchmarking transposable element annotation methods , 2015, Mobile DNA.

[22]  P. Deininger Jerzy Jurka – 1950–2014 , 2015, Mobile DNA.

[23]  M. Morgante,et al.  Gene duplication and exon shuffling by helitron-like transposons generate intraspecies diversity in maize , 2005, Nature Genetics.

[24]  Sudhir Kumar,et al.  Mutation rates in mammalian genomes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Alexander Suh The Specific Requirements for CR1 Retrotransposition Explain the Scarcity of Retrogenes in Birds , 2015, Journal of Molecular Evolution.

[26]  R. Adkins,et al.  Higher-level systematics of rodents and divergence time estimates based on two congruent nuclear genes. , 2003, Molecular phylogenetics and evolution.

[27]  S. Boissinot,et al.  L1 (LINE-1) retrotransposon diversity differs dramatically between mammals and fish. , 2004, Trends in genetics : TIG.

[28]  Min-Jin Han,et al.  Identification and Evolution of the Silkworm Helitrons and their Contribution to Transcripts , 2013, DNA research : an international journal for rapid publication of reports on genes and genomes.

[29]  D. Ray,et al.  A non-LTR retroelement extinction in Spermophilus tridecemlineatus. , 2012, Gene.

[30]  M. G. Kidwell,et al.  PERSPECTIVE: TRANSPOSABLE ELEMENTS, PARASITIC DNA, AND GENOME EVOLUTION , 2001, Evolution; international journal of organic evolution.

[31]  D. Ray,et al.  Transposable element evolution in Heliconius suggests genome diversity within Lepidoptera , 2013, Mobile DNA.

[32]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[33]  L. Peshkin,et al.  Genome sequencing reveals insights into physiology and longevity of the naked mole rat , 2011, Nature.

[34]  A. Brower,et al.  PARALLEL RACE FORMATION AND THE EVOLUTION OF MIMICRY IN HELICONIUS BUTTERFLIES: A PHYLOGENETIC HYPOTHESIS FROM MITOCHONDRIAL DNA SEQUENCES , 1996, Evolution; international journal of organic evolution.

[35]  Robert J. Baker,et al.  Rolling-Circle Transposons Catalyze Genomic Innovation in a Mammalian Lineage , 2014, Genome biology and evolution.

[36]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[37]  C. W. Kilpatrick,et al.  Multiple molecular evidences for a living mammalian fossil , 2007, Proceedings of the National Academy of Sciences.

[38]  J. Deragon,et al.  Plant Transposable Elements , 2012, Topics in Current Genetics.

[39]  Marlen S. Clark,et al.  Repeated horizontal transfer of a DNA transposon in mammals and other tetrapods , 2008, Proceedings of the National Academy of Sciences.

[40]  M. Batzer,et al.  Repetitive Elements May Comprise Over Two-Thirds of the Human Genome , 2011, PLoS genetics.

[41]  Jacob D. Jaffe,et al.  The genome of the green anole lizard and a comparative analysis with birds and mammals , 2011, Nature.