RelocaTE2: a high resolution transposable element insertion site mapping tool for population resequencing

Background Transposable element (TE) polymorphisms are important components of population genetic variation. The functional impacts of TEs in gene regulation and generating genetic diversity have been observed in multiple species, but the frequency and magnitude of TE variation is under appreciated. Inexpensive and deep sequencing technology has made it affordable to apply population genetic methods to whole genomes with methods that identify single nucleotide and insertion/deletion polymorphisms. However, identifying TE polymorphisms, particularly transposition events or non-reference insertion sites can be challenging due to the repetitive nature of these sequences, which hamper both the sensitivity and specificity of analysis tools. Methods We have developed the tool RelocaTE2 for identification of TE insertion sites at high sensitivity and specificity. RelocaTE2 searches for known TE sequences in whole genome sequencing reads from second generation sequencing platforms such as Illumina. These sequence reads are used as seeds to pinpoint chromosome locations where TEs have transposed. RelocaTE2 detects target site duplication (TSD) of TE insertions allowing it to report TE polymorphism loci with single base pair precision. Results and Discussion The performance of RelocaTE2 is evaluated using both simulated and real sequence data. RelocaTE2 demonstrate high level of sensitivity and specificity, particularly when the sequence coverage is not shallow. In comparison to other tools tested, RelocaTE2 achieves the best balance between sensitivity and specificity. In particular, RelocaTE2 performs best in prediction of TSDs for TE insertions. Even in highly repetitive regions, such as those tested on rice chromosome 4, RelocaTE2 is able to report up to 95% of simulated TE insertions with less than 0.1% false positive rate using 10-fold genome coverage resequencing data. RelocaTE2 provides a robust solution to identify TE insertion sites and can be incorporated into analysis workflows in support of describing the complete genotype from light coverage genome sequencing.

[1]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[2]  Yujun Zhang,et al.  A fine physical map of the rice chromosome 4. , 2002, Genome research.

[3]  Takuji Sasaki,et al.  A fine physical map of the rice chromosome 5 , 2005, Molecular Genetics and Genomics.

[4]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[5]  Philip M. Kim,et al.  Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome , 2007, Science.

[6]  Antony V. Cox,et al.  Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing , 2008, Nature Genetics.

[7]  C. Feschotte Transposable elements and the evolution of regulatory networks , 2008, Nature Reviews Genetics.

[8]  E. Kirkness,et al.  Mobile elements create structural variation: analysis of a complete human genome. , 2009, Genome research.

[9]  M. Batzer,et al.  The impact of retrotransposons on human genome evolution , 2009, Nature Reviews Genetics.

[10]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[11]  G. Bourque,et al.  Transposable elements have rewired the core regulatory network of human embryonic stem cells , 2010, Nature Genetics.

[12]  Faraz Hach,et al.  Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery , 2010, Bioinform..

[13]  Adrian M. Stütz,et al.  A Comprehensive Map of Mobile Element Insertion Polymorphisms in Humans , 2011, PLoS genetics.

[14]  Vincent J. Lynch,et al.  Transposon-mediated rewiring of gene regulatory networks contributed to the evolution of pregnancy in mammals , 2011, Nature Genetics.

[15]  Zhen Yue,et al.  pIRS: Profile-based Illumina pair-end reads simulator , 2012, Bioinform..

[16]  Lin Fang,et al.  Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes , 2011, Nature Biotechnology.

[17]  Damon Lisch,et al.  How important are transposons for plant evolution? , 2012, Nature Reviews Genetics.

[18]  Yutaka Okumoto,et al.  The Use of RelocaTE and Unassembled Short Reads to Produce High-Resolution Snapshots of Transposable Element Generated Diversity in Rice , 2013, G3: Genes, Genomes, Genetics.

[19]  Rebecca J. Oakey,et al.  Transposable Elements Re-Wire and Fine-Tune the Transcriptome , 2013, PLoS genetics.

[20]  Thomas M. Keane,et al.  RetroSeq: transposable element discovery from next-generation sequencing data , 2013, Bioinform..

[21]  Zhiping Weng,et al.  TEMP: a computational method for analyzing transposable element polymorphism in populations , 2014, Nucleic acids research.

[22]  J. Bennetzen,et al.  The contributions of transposable elements to the structure, function, and evolution of plant genomes. , 2014, Annual review of plant biology.

[23]  Zhihai Ma,et al.  Widespread contribution of transposable elements to the innovation of gene regulatory networks , 2014, Genome research.

[24]  Doreen Ware,et al.  Whole genome de novo assemblies of three divergent strains of rice, Oryza sativa, document novel gene space of aus and indica , 2014, Genome Biology.

[25]  Renyi Liu,et al.  ITIS, a bioinformatics tool for accurate identification of transposon insertion sites using next-generation sequencing data , 2015, BMC Bioinformatics.

[26]  G. Mayhew,et al.  The Arabidopsis thaliana mobilome and its impact at the species level , 2016, eLife.