Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline

Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and allow for annotation of TEs. There are numerous methods for each class of elements with unknown relative performance metrics. We benchmarked existing programs based on a curated library of rice TEs. Using the most robust programs, we created a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a condensed TE library for annotations of structurally intact and fragmented elements. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.

[1]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[2]  P. Barret,et al.  A sequence related to rice Pong transposable element displays transcriptional activation by in vitro culture and reveals somaclonal variations in maize. , 2006, Genome.

[3]  M. Lynch,et al.  De novo identification of LTR retrotransposons in eukaryotic genomes , 2007, BMC Genomics.

[4]  Jason S. Caronna,et al.  The complete Ac/Ds transposon family of maize , 2011, BMC Genomics.

[5]  Marcelo Helguera,et al.  MITE Tracker: an accurate approach to identify miniature inverted-repeat transposable elements in large genomes , 2018, BMC Bioinformatics.

[6]  Mark Yandell,et al.  MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects , 2011, BMC Bioinformatics.

[7]  Nikita S. Vassetzky,et al.  SINEBase: a database and tool for SINE analysis , 2012, Nucleic Acids Res..

[8]  Chunguang Du,et al.  HelitronScanner uncovers a large overlooked cache of Helitron transposons in many plant genomes , 2014, Proceedings of the National Academy of Sciences.

[9]  O. Kohany,et al.  Repbase Update, a database of repetitive elements in eukaryotic genomes , 2015, Mobile DNA.

[10]  Gordon Gremme,et al.  GenomeTools: A Comprehensive Software Library for Efficient Processing of Structured Genome Annotations , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Ellen J. Pritham,et al.  Helitrons, the Eukaryotic Rolling-circle Transposable Elements , 2015, Microbiology spectrum.

[12]  Hao Wang,et al.  SINE_scan: an efficient tool to discover short interspersed nuclear elements (SINEs) in large-scale genomic datasets , 2016, Bioinform..

[13]  Dawn H. Nagel,et al.  The B73 Maize Genome: Complexity, Diversity, and Dynamics , 2009, Science.

[14]  Geoffrey C. Fox,et al.  MGEScan: a Galaxy-based system for identifying retrotransposons in genomes , 2016, Bioinform..

[15]  Hani Z. Girgis,et al.  LtrDetector: A tool-suite for detecting long terminal repeat retrotransposons de-novo , 2019, BMC Genomics.

[16]  C. Liang,et al.  Generic Repeat Finder: A High-Sensitivity Tool for Genome-Wide De Novo Repeat Detection1 , 2019, Plant Physiology.

[17]  T. Flutre,et al.  Considering Transposable Element Diversification in De Novo Annotation Approaches , 2011, PloS one.

[18]  Miranda J. Haus,et al.  Genome of the long-living sacred lotus (Nelumbo nucifera Gaertn.) , 2013, Genome Biology.

[19]  S. Wessler,et al.  Comparison of class 2 transposable elements at superfamily resolution reveals conserved and distinct features in cereal grass genomes , 2013, BMC Genomics.

[20]  Susan R. Wessler,et al.  MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences , 2010, Nucleic acids research.

[21]  S. Eddy,et al.  Automated de novo identification of repeat sequence families in sequenced genomes. , 2002, Genome research.

[22]  John F. McDonald,et al.  LTR_STRUC: a novel search and identification program for LTR retrotransposons , 2003, Bioinform..

[23]  S. Wessler,et al.  PIF- and Pong-like transposable elements: distribution, evolution and relationship with Tourist-like miniature inverted-repeat transposable elements. , 2004, Genetics.

[24]  Yuri Pirola,et al.  Bioconda: sustainable and comprehensive software distribution for the life sciences , 2017, Nature Methods.

[25]  Ann A. Ferguson,et al.  What makes up plant genomes: The vanishing line between transposable elements and genes. , 2016, Biochimica et biophysica acta.

[26]  Shujun Ou,et al.  LTR_FINDER_parallel: parallelization of LTR_FINDER enabling rapid identification of long terminal repeat retrotransposons , 2019, Mobile DNA.

[27]  T. Eickbush,et al.  The diversity of retrotransposons and the properties of their reverse transcriptases. , 2008, Virus research.

[28]  W. Jin,et al.  ZmCCT9 enhances maize adaptation to higher latitudes , 2017, Proceedings of the National Academy of Sciences.

[29]  李佩芳 International Rice Genome Sequencing Project. 2005. The map-based sequence of the rice genome. , 2005 .

[30]  Sean R. Eddy,et al.  An active DNA transposon family in rice , 2003, Nature.

[31]  S. Wessler,et al.  Tracking the origin of two genetic components associated with transposable element bursts in domesticated rice , 2019, Nature Communications.

[32]  Guoli Ji,et al.  detectMITE: A novel approach to detect miniature inverted repeat transposable elements in genomes , 2016, Scientific Reports.

[33]  Ryan E. Mills,et al.  Which transposable elements are active in the human genome? , 2007, Trends in genetics : TIG.

[34]  Kateryna D Makova,et al.  The (r)evolution of SINE versus LINE distributions in primate genomes: sex chromosomes are important. , 2010, Genome research.

[35]  X. Gu,et al.  TIR-Learner, a New Ensemble Method for TIR Transposable Element Annotation, Provides Evidence for Abundant New Transposable Elements in the Maize Genome. , 2019, Molecular plant.

[36]  T. Peterson Plant Transposable Elements , 2013, Methods in Molecular Biology.

[37]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[38]  Takuji Sasaki,et al.  The map-based sequence of the rice genome , 2005, Nature.

[39]  Pavel A. Pevzner,et al.  De novo identification of repeat families in large genomes , 2005, ISMB.

[40]  J. Bennetzen,et al.  A unified classification system for eukaryotic transposable elements , 2007, Nature Reviews Genetics.

[41]  Yu Zhang,et al.  P-MITE: a database for plant miniature inverted-repeat transposable elements , 2013, Nucleic Acids Res..

[42]  J. Bennetzen,et al.  Plant retrotransposons. , 1999, Annual review of genetics.

[43]  M. Yandell,et al.  Genome Annotation and Curation Using MAKER and MAKER‐P , 2014, Current protocols in bioinformatics.

[44]  Zhao Xu,et al.  LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons , 2007, Nucleic Acids Res..

[45]  Alexandre P. Marand,et al.  Historical Meiotic Crossover Hotspots Fueled Patterns of Evolutionary Divergence in Rice , 2019, Plant Cell.

[46]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[47]  Jeffrey Ross-Ibarra,et al.  Identification of a functional transposon insertion in the maize domestication gene tb1 , 2011, Nature Genetics.

[48]  Katharina J. Hoff,et al.  BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS , 2016, Bioinform..

[49]  G. Bourque,et al.  Computational tools to unmask transposable elements , 2018, Nature Reviews Genetics.

[50]  E. Lerat Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs , 2010, Heredity.

[51]  Sean R. Eddy,et al.  Pack-MULE transposable elements mediate gene evolution in plants , 2004, Nature.

[52]  Lixing Yang,et al.  Distribution, diversity, evolution, and survival of Helitrons in the maize genome , 2009, Proceedings of the National Academy of Sciences.

[53]  Xuequn Shang,et al.  MiteFinderII: a novel tool to identify miniature inverted-repeat transposable elements hidden in eukaryotic genomes , 2018, BMC Medical Genomics.

[54]  Cédric Feschotte,et al.  Genome-wide analysis of mariner-like transposable elements in rice reveals complex relationships with stowaway miniature inverted repeat transposable elements (MITEs). , 2003, Genetics.

[55]  Carolyn J. Lawrence-Dill,et al.  MAKER-P: A Tool Kit for the Rapid Creation, Management, and Quality Control of Plant Genome Annotations1[W][OPEN] , 2013, Plant Physiology.

[56]  Travis J. Wheeler,et al.  A call for benchmarking transposable element annotation methods , 2015, Mobile DNA.

[57]  Jeffrey Ross-Ibarra,et al.  Improved maize reference genome with single-molecule technologies , 2017, Nature.

[58]  Hani Z. Girgis Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale , 2015, BMC Bioinformatics.

[59]  Jonathan D. G. Jones,et al.  Shifting the limits in wheat research and breeding using a fully annotated reference genome , 2018, Science.

[60]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[61]  Shujun Ou,et al.  LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons1[OPEN] , 2017, Plant Physiology.

[62]  Ruochi Zhang,et al.  MUSTv2: An Improved De Novo Detection Program for Recently Active Miniature Inverted Repeat Transposable Elements (MITEs) , 2017, J. Integr. Bioinform..

[63]  J. Bennetzen,et al.  Structure-based discovery and description of plant and animal Helitrons , 2009, Proceedings of the National Academy of Sciences.

[64]  Gary Benson,et al.  Inverted repeat structure of the human genome: the X-chromosome contains a preponderance of large, highly homologous inverted repeats that contain testes genes. , 2004, Genome research.

[65]  Renan Valieris,et al.  Bioconda: sustainable and comprehensive software distribution for the life sciences , 2018, Nature Methods.

[66]  Stefan Kurtz,et al.  LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons , 2008, BMC Bioinformatics.

[67]  Shujun Ou,et al.  Assessing genome assembly quality using the LTR Assembly Index (LAI) , 2018, Nucleic acids research.

[68]  Cristian Chaparro,et al.  Exceptional Diversity, Non-Random Distribution, and Rapid Evolution of Retroelements in the B73 Maize Genome , 2009, PLoS genetics.

[69]  J. Jurka,et al.  Rolling-circle transposons in eukaryotes , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[70]  Sonja J. Prohaska,et al.  Multiple sequence alignment with user-defined constraints at GOBICS , 2005, Bioinform..

[71]  L. Rieseberg,et al.  The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution , 2017, Nature.

[72]  S. Jackson,et al.  RiTE database: a resource database for genus-wide rice genomics and evolutionary biology , 2015, BMC Genomics.

[73]  B. Mcclintock Cytogenetic Studies of Maize and Neurospora , 1945 .