detectMITE: A novel approach to detect miniature inverted repeat transposable elements in genomes

Miniature inverted repeat transposable elements (MITEs) are prevalent in eukaryotic genomes, including plants and animals. Classified as a type of non-autonomous DNA transposable elements, they play important roles in genome organization and evolution. Comprehensive and accurate genome-wide detection of MITEs in various eukaryotic genomes can improve our understanding of their origins, transposition processes, regulatory mechanisms, and biological relevance with regard to gene structures, expression, and regulation. In this paper, we present a new MATLAB-based program called detectMITE that employs a novel numeric calculation algorithm to replace conventional string matching algorithms in MITE detection, adopts the Lempel-Ziv complexity algorithm to filter out MITE candidates with low complexity, and utilizes the powerful clustering program CD-HIT to cluster similar MITEs into MITE families. Using the rice genome as test data, we found that detectMITE can more accurately, comprehensively, and efficiently detect MITEs on a genome-wide scale than other popular MITE detection tools. Through comparison with the potential MITEs annotated in Repbase, the widely used eukaryotic repeat database, detectMITE has been shown to find known and novel MITEs with a complete structure and full-length copies in the genome. detectMITE is an open source tool (https://sourceforge.net/projects/detectmite).

[1]  N. Bannert,et al.  Retroelements and the human genome: New perspectives on an old relation , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[3]  T. Bureau,et al.  Inter-MITE polymorphisms (IMP): a high throughput transposon-based genome mapping and fingerprinting approach , 2001, Theoretical and Applied Genetics.

[4]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[5]  John E. Karro,et al.  A MATLAB-based tool for accurate detection of perfect overlapping and nested inverted repeats in DNA sequences , 2014, Bioinform..

[6]  Abraham Lempel,et al.  On the Complexity of Finite Sequences , 1976, IEEE Trans. Inf. Theory.

[7]  B. Gill,et al.  Sequence composition, organization, and evolution of the core Triticeae genome. , 2004, The Plant journal : for cell and molecular biology.

[8]  Masaki Momose,et al.  Miniature Inverted-Repeat Transposable Elements of Stowaway Are Active in Potato , 2010, Genetics.

[9]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[10]  Beery Yaakov,et al.  Genome-Wide Analysis of Stowaway-Like MITEs in Wheat Reveals High Sequence Conservation, Gene Association, and Genomic Diversification1[C][W] , 2012, Plant Physiology.

[11]  J. Jurka Repbase update: a database and an electronic journal of repetitive elements. , 2000, Trends in genetics : TIG.

[12]  Casey M. Bergman,et al.  Discovering and detecting transposable elements in genome sequences , 2007, Briefings Bioinform..

[13]  J. Jurka,et al.  Repbase Update, a database of eukaryotic repetitive elements , 2005, Cytogenetic and Genome Research.

[14]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[15]  Chris Sander,et al.  Removing near-neighbour redundancy from large protein sequence collections , 1998, Bioinform..

[16]  Beom-Soon Choi,et al.  BrassicaTED - a public database for utilization of miniature transposable elements in Brassica species , 2014, BMC Research Notes.

[17]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[18]  Wilfred W. Li,et al.  MEME: discovering and analyzing DNA and protein sequence motifs , 2006, Nucleic Acids Res..

[19]  M. Morgante,et al.  Abundance, distribution, and transcriptional activity of repetitive elements in the maize genome. , 2001, Genome research.

[20]  Priyanka Bhardwaj,et al.  Miniature inverted-repeat transposable elements: discovery, distribution, and activity. , 2013, Genome.

[21]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[22]  T. A. Hall,et al.  BIOEDIT: A USER-FRIENDLY BIOLOGICAL SEQUENCE ALIGNMENT EDITOR AND ANALYSIS PROGRAM FOR WINDOWS 95/98/ NT , 1999 .

[23]  Z. Tu,et al.  Eight novel families of miniature inverted repeat transposable elements in the African malaria mosquito, Anopheles gambiae. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Guojun Yang,et al.  MITE Digger, an efficient and accurate algorithm for genome wide discovery of miniature inverted repeat transposable elements , 2013, BMC Bioinformatics.

[25]  Jun Wang,et al.  The draft genome of Tibetan hulless barley reveals adaptive patterns to the high stressful Tibetan Plateau , 2015, Proceedings of the National Academy of Sciences.

[26]  H. Kazazian Mobile Elements: Drivers of Genome Evolution , 2004, Science.

[27]  J. Bennetzen,et al.  A unified classification system for eukaryotic transposable elements , 2007, Nature Reviews Genetics.

[28]  S. Wessler,et al.  Using rice to understand the origin and amplification of miniature inverted repeat transposable elements (MITEs). , 2004, Current opinion in plant biology.

[29]  J. Jurka,et al.  A universal classification of eukaryotic transposable elements implemented in Repbase , 2008, Nature Reviews Genetics.

[30]  S. Wessler,et al.  LTR-retrotransposons and MITEs: important players in the evolution of plant genomes. , 1995, Current opinion in genetics & development.

[31]  Krassimira Botcheva,et al.  Cell Context Dependent p53 Genome-Wide Binding Patterns and Enrichment at Repeats , 2014, PloS one.

[32]  Xinshu Xiao,et al.  Genomic Analysis of ADAR1 Binding and its Involvement in Multiple RNA Processing Pathways , 2015, Nature Communications.

[33]  Robert A. Edwards,et al.  Quality control and preprocessing of metagenomic datasets , 2011, Bioinform..

[34]  Guoli Ji,et al.  detectIR: A Novel Program for Detecting Perfect and Imperfect Inverted Repeats Using Complex Numbers and Vector Calculation , 2014, PloS one.

[35]  N. Fedoroff Transposable Elements, Epigenetics, and Genome Evolution , 2012 .

[36]  Jerzy Jurka,et al.  VisualRepbase: an interface for the study of occurrences of transposable element families , 2008, BMC Bioinformatics.

[37]  Susan R. Wessler,et al.  MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences , 2010, Nucleic acids research.

[38]  Yu Zhang,et al.  P-MITE: a database for plant miniature inverted-repeat transposable elements , 2013, Nucleic Acids Res..

[39]  Yutaka Okumoto,et al.  A genome-wide view of miniature inverted-repeat transposable elements (MITEs) in rice, Oryza sativa ssp. japonica. , 2008, Genes & genetic systems.

[40]  K. Shirasawa,et al.  Genome-Wide Comparative Analysis of 20 Miniature Inverted-Repeat Transposable Element Families in Brassica rapa and B. oleracea , 2014, PloS one.

[41]  Yu Zhang,et al.  Miniature Inverted–Repeat Transposable Elements (MITEs) Have Been Accumulated through Amplification Bursts and Play Important Roles in Gene Expression and Species Diversity in Oryza sativa , 2011, Molecular biology and evolution.

[42]  Roberto Hornero,et al.  Interpretation of the Lempel-Ziv Complexity Measure in the Context of Biomedical Signal Analysis , 2006, IEEE Transactions on Biomedical Engineering.

[43]  C. Robin Buell,et al.  The TIGR Plant Repeat Databases: a collective resource for the identification of repetitive sequences in plants , 2004, Nucleic Acids Res..

[44]  S. Wessler,et al.  High Potential of a Transposon mPing as a Marker System in japonica × japonica Cross in Rice , 2009, DNA research : an international journal for rapid publication of reports on genes and genomes.

[45]  F. Zhou,et al.  MUST: a system for identification of miniature inverted-repeat transposable elements and applications to Anabaena variabilis and Haloquadratum walsbyi. , 2009, Gene.

[46]  S. Wright,et al.  Effects of recombination rate and gene density on transposable element distributions in Arabidopsis thaliana. , 2003, Genome research.

[47]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[48]  Steven J. M. Jones,et al.  De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data , 2009, Genome Biology.

[49]  Alejandro A. Schäffer,et al.  WindowMasker: window-based masker for sequenced genomes , 2006, Bioinform..