Mobile genomics: tools and techniques for tackling transposons

Next-generation sequencing approaches have fundamentally changed the types of questions that can be asked about gene function and regulation. With the goal of approaching truly genome-wide quantifications of all the interaction partners and downstream effects of particular genes, these quantitative assays have allowed for an unprecedented level of detail in exploring biological interactions. However, many challenges remain in our ability to accurately describe and quantify the interactions that take place in those hard to reach and extremely repetitive regions of our genome comprised mostly of transposable elements (TEs). Tools dedicated to TE-derived sequences have lagged behind, making the inclusion of these sequences in genome-wide analyses difficult. Recent improvements, both computational and experimental, allow for the better inclusion of TE sequences in genomic assays and a renewed appreciation for the importance of TE biology. This review will discuss the recent improvements that have been made in the computational analysis of TE-derived sequences as well as the areas where such analysis still proves difficult. This article is part of a discussion meeting issue ‘Crossroads between transposons and gene regulation’.

[1]  M. Axtell ShortStack: comprehensive annotation and quantification of small RNA genes. , 2013, RNA.

[2]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[3]  Gene W. Yeo,et al.  Rbfox proteins regulate alternative mRNA splicing through evolutionarily conserved RNA bridges , 2013, Nature Structural &Molecular Biology.

[4]  A. Tanay,et al.  Single cell analysis reveals dynamics of transposable element transcription following epigenetic de-repression , 2018, bioRxiv.

[5]  N. Brockdorff,et al.  The interplay of histone modifications – writers that read , 2015, EMBO reports.

[6]  D. Trono,et al.  KRAB zinc-finger proteins contribute to the evolution of gene regulatory networks , 2017, Nature.

[7]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[8]  Dawn H. Nagel,et al.  The B73 Maize Genome: Complexity, Diversity, and Dynamics , 2009, Science.

[9]  Colin N. Dewey,et al.  De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis , 2013, Nature Protocols.

[10]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[11]  Ryan L. Collins,et al.  Multi-platform discovery of haplotype-resolved structural variation in human genomes , 2017, bioRxiv.

[12]  Wei Li,et al.  BSMAP: whole genome bisulfite sequence MAPping program , 2009, BMC Bioinformatics.

[13]  Eun Ji Kim,et al.  Simulation-based comprehensive benchmarking of RNA-seq aligners , 2016, Nature Methods.

[14]  C. Feschotte,et al.  Regulatory evolution of innate immunity through co-option of endogenous retroviruses , 2016, Science.

[15]  Wei Li,et al.  MOABS: model based analysis of bisulfite sequencing data , 2014, Genome Biology.

[16]  O. Rando,et al.  LINE-1 activation after fertilization regulates global chromatin accessibility in the early mouse embryo , 2017, Nature Genetics.

[17]  Lior Pachter,et al.  PROBer Provides a General Toolkit for Analyzing Sequencing-Based Toeprinting Assays. , 2017, Cell systems.

[18]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[19]  K. Burns,et al.  SQuIRE reveals locus-specific regulation of interspersed repeat expression , 2019, Nucleic acids research.

[20]  E. van Nimwegen,et al.  Crunch: integrated processing and modeling of ChIP-seq data in terms of regulatory motifs , 2018, bioRxiv.

[21]  Héctor Corrada Bravo,et al.  Fast and interpretable alternative splicing and differential gene-level expression analysis using transcriptome segmentation with Yanagi , 2018, bioRxiv.

[22]  R. Lister,et al.  Highly Integrated Single-Base Resolution Maps of the Epigenome in Arabidopsis , 2008, Cell.

[23]  Florentino Fernández Riverola,et al.  Bicycle: a bioinformatics pipeline to analyze bisulfite sequencing data , 2018, Bioinform..

[24]  David Rosenkranz,et al.  unitas: the universal tool for annotation of small RNAs , 2017, BMC Genomics.

[25]  R. Martienssen,et al.  Transposable elements and the epigenetic regulation of the genome , 2007, Nature Reviews Genetics.

[26]  Robert J. Schmitz,et al.  Monitoring the interplay between transposable element families and DNA methylation in maize , 2019, PLoS genetics.

[27]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[28]  Orit Rozenblatt-Rosen,et al.  Systematic comparative analysis of single cell RNA-sequencing methods , 2019, bioRxiv.

[29]  Yadong Wang,et al.  rHAT: fast alignment of noisy long reads with regional hashing , 2016, Bioinform..

[30]  Geoffrey J Maher,et al.  The adult human testis transcriptional cell atlas , 2018, Cell Research.

[31]  Gregory J. Hannon,et al.  Small RNAs as Guardians of the Genome , 2009, Cell.

[32]  B. Langmead,et al.  Aligning Short Sequencing Reads with Bowtie , 2010, Current protocols in bioinformatics.

[33]  Shuqiang Li,et al.  CEL-Seq2: sensitive highly-multiplexed single-cell RNA-Seq , 2016, Genome Biology.

[34]  Christopher A. Miller,et al.  Cellular stressors contribute to the expansion of hematopoietic clones of varying leukemic potential , 2018, Nature Communications.

[35]  David Haussler,et al.  Long-read sequence assembly of the gorilla genome , 2016, Science.

[36]  Wei Wang,et al.  piPipes: a set of pipelines for piRNA and transposon analysis via small RNA-seq, RNA-seq, degradome- and CAGE-seq, ChIP-seq and genomic DNA sequencing , 2014, Bioinform..

[37]  Yadong Wang,et al.  LAMSA: fast split read alignment with long approximate matches , 2017, Bioinform..

[38]  Feng Liu,et al.  A genetic algorithm-based weighted ensemble method for predicting transposon-derived piRNAs , 2016, BMC Bioinformatics.

[39]  Keith A. Crandall,et al.  Telescope: Characterization of the retrotranscriptome by accurate estimation of transposable element expression , 2018, bioRxiv.

[40]  A. Regev,et al.  Scaling single-cell genomics from phenomenology to mechanism , 2017, Nature.

[41]  Michael C. Schatz,et al.  Accurate detection of complex structural variations using single molecule sequencing , 2017, Nature Methods.

[42]  M. Hammell,et al.  TEsmall Identifies Small RNAs Associated With Targeted Inhibitor Resistance in Melanoma , 2018, bioRxiv.

[43]  Philip S. Yu,et al.  G-Bean: an ontology-graph based web tool for biomedical literature retrieval , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[44]  Flavio Licciulli,et al.  WoPPER: Web server for Position Related data analysis of gene Expression in Prokaryotes , 2017, Nucleic Acids Res..

[45]  Joshua Y. S. Tang,et al.  Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads , 2018, bioRxiv.

[46]  Touati Benoukraf,et al.  Methodological aspects of whole-genome bisulfite sequencing analysis , 2015, Briefings Bioinform..

[47]  I. Hellmann,et al.  Comparative Analysis of Single-Cell RNA Sequencing Methods , 2016, bioRxiv.

[48]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[49]  Wen-Lian Hsu,et al.  Kart: a divide-and-conquer algorithm for NGS read alignment , 2017, Bioinform..

[50]  Michael Q. Zhang,et al.  Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications , 2010, Nature Biotechnology.

[51]  S T Sherry,et al.  Reading between the LINEs: human genomic variation induced by LINE-1 retrotransposition. , 2000, Genome research.

[52]  G. Bourque,et al.  The retrovirus HERVH is a long noncoding RNA required for human embryonic stem cell identity , 2014, Nature Structural &Molecular Biology.

[53]  Helen M. Rowe,et al.  TRIM28 repression of retrotransposon-based enhancers is necessary to preserve transcriptional dynamics in embryonic stem cells , 2013, Genome research.

[54]  Piero Carninci,et al.  Edinburgh Research Explorer Endogenous Retrotransposition Activates Oncogenic Pathways in Hepatocellular Carcinoma Endogenous Retrotransposition Activates Oncogenic Pathways in Hepatocellular Carcinoma , 2022 .

[55]  Andrew C. Adey,et al.  Single-Cell Transcriptional Profiling of a Multicellular Organism , 2017 .

[56]  Howard Y. Chang,et al.  ATAC‐seq: A Method for Assaying Chromatin Accessibility Genome‐Wide , 2015, Current protocols in molecular biology.

[57]  N. Darzentas,et al.  Considerations and complications of mapping small RNA high-throughput data to transposable elements , 2017, Mobile DNA.

[58]  Artemis G. Hatzigeorgiou,et al.  Manatee: detection and quantification of small non-coding RNAs from next-generation sequencing data , 2020, Scientific Reports.

[59]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[60]  Pao-Yang Chen,et al.  BS-Seeker3: ultrafast pipeline for bisulfite sequencing , 2018, BMC Bioinformatics.

[61]  Maksims Fiosins,et al.  Oasis 2: improved online analysis of small RNA-seq data , 2018, BMC Bioinformatics.

[62]  Gabrielle Deschamps-Francoeur,et al.  CoCo: RNA-seq read assignment correction for nested genes and multimapped reads , 2018, bioRxiv.

[63]  Eric Song,et al.  ERVmap analysis reveals genome-wide transcription of human endogenous retroviruses , 2018, Proceedings of the National Academy of Sciences.

[64]  Michael Hackenberg,et al.  sRNAtoolbox: an integrated collection of small RNA research tools , 2015, Nucleic Acids Res..

[65]  Agus Salim,et al.  miREM: an expectation-maximization approach for prioritizing miRNAs associated with gene-set , 2018, BMC Bioinformatics.

[66]  J. Han,et al.  Widespread roles of enhancer-like transposable elements in cell identity and long-range genomic interactions , 2018, Genome research.

[67]  Brian P. Brunk,et al.  Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM) , 2011, Bioinform..

[68]  Fei Li,et al.  WaspBase: a genomic resource for the interactions among parasitic wasps, insect hosts and plants , 2018, Database J. Biol. Databases Curation.

[69]  C. Feschotte Transposable elements and the evolution of regulatory networks , 2008, Nature Reviews Genetics.

[70]  Juan M. Vaquerizas,et al.  A molecular roadmap for the emergence of early-embryonic-like cells in culture , 2017, Nature Genetics.

[71]  O. Kohany,et al.  Repbase Update, a database of repetitive elements in eukaryotic genomes , 2015, Mobile DNA.

[72]  Rob Patro,et al.  Salmon provides fast and bias-aware quantification of transcript expression , 2017, Nature Methods.

[73]  M. Branco,et al.  Regulation of transposable elements by DNA modifications , 2019, Nature Reviews Genetics.

[74]  Mark Gerstein,et al.  TeXP: Deconvolving the effects of pervasive and autonomous transcription of transposable elements , 2019, PLoS Comput. Biol..

[75]  C. Feschotte,et al.  The evolutionary history of human DNA transposons: evidence for intense activity in the primate lineage. , 2007, Genome research.

[76]  Manolis Maragkakis,et al.  CLIPSeqTools—a novel bioinformatics CLIP-seq analysis suite , 2016, RNA.

[77]  I. Amit,et al.  Massively Parallel Single-Cell RNA-Seq for Marker-Free Decomposition of Tissues into Cell Types , 2014, Science.

[78]  Ying Jin,et al.  TEtranscripts: a package for including transposable elements in differential expression analysis of RNA-seq datasets , 2015, Bioinform..

[79]  J. Boeke,et al.  Transcription factor profiling reveals molecular choreography and key regulators of human retrotransposon expression , 2018, Proceedings of the National Academy of Sciences.

[80]  Howard Y. Chang,et al.  Intrinsic retroviral reactivation in human preimplantation embryos and pluripotent cells , 2015, Nature.

[81]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[82]  Thomas Nussbaumer,et al.  MIPS PlantsDB: a database framework for comparative plant genome research , 2012, Nucleic Acids Res..

[83]  Zhihai Ma,et al.  Widespread contribution of transposable elements to the innovation of gene regulatory networks , 2014, Genome research.

[84]  C. Feschotte,et al.  DNA transposons and the evolution of eukaryotic genomes. , 2007, Annual review of genetics.

[85]  Mira V. Han,et al.  Paired-end mappability of transposable elements in the human genome , 2019, Mobile DNA.

[86]  Mohammad M. Karimi,et al.  LIONS: Analysis Suite for Detecting and Quantifying Transposable Element Initiated Transcription from RNA-seq , 2017, bioRxiv.

[87]  Ryan E. Mills,et al.  Which transposable elements are active in the human genome? , 2007, Trends in genetics : TIG.

[88]  Yi Zhang,et al.  A k-mer scheme to predict piRNAs and characterize locust piRNAs , 2011, Bioinform..

[89]  Felix Krueger,et al.  Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications , 2011, Bioinform..

[90]  F. Macciardi,et al.  Novel Bioinformatics Approach Identifies Transcriptional Profiles of Lineage-Specific Transposable Elements at Distinct Loci in the Human Dorsolateral Prefrontal Cortex , 2018, Molecular biology and evolution.

[91]  R. Jaenisch,et al.  Hominoid-Specific Transposable Elements and KZFPs Facilitate Human Embryonic Genome Activation and Control Transcription in Naive Human ESCs , 2019, Cell stem cell.

[92]  S. Nelson,et al.  Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning , 2008, Nature.

[93]  Faraz Hach,et al.  lordFAST: sensitive and Fast Alignment Search Tool for LOng noisy Read sequencing Data , 2018, Bioinform..

[94]  R. Martienssen,et al.  LTR-Retrotransposon Control by tRNA-Derived Small RNAs , 2017, Cell.

[95]  Jianrong Wang,et al.  A Gibbs sampling strategy applied to the mapping of ambiguous short-sequence tags , 2010, Bioinform..

[96]  T. Heidmann,et al.  Nomenclature for endogenous retrovirus (ERV) loci , 2018, Retrovirology.

[97]  Fei Li,et al.  Prediction of piRNAs using transposon interaction and a support vector machine , 2014, BMC Bioinformatics.

[98]  Kresimir Krizanovic,et al.  Evaluation of tools for long read RNA-seq splice-aware alignment , 2017, bioRxiv.

[99]  C. Feschotte,et al.  Regulatory activities of transposable elements: from conflicts to benefits , 2016, Nature Reviews Genetics.

[100]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[101]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[102]  Zhandong Liu,et al.  An ultra-fast and scalable quantification pipeline for transposable elements from next generation sequencing data , 2018, PSB.

[103]  J. Ule,et al.  Heteromeric RNP Assembly at LINEs Controls Lineage-Specific RNA Processing , 2018, Cell.

[104]  Lee E. Edsall,et al.  Human DNA methylomes at base resolution show widespread epigenomic differences , 2009, Nature.

[105]  M. Batzer,et al.  Repetitive Elements May Comprise Over Two-Thirds of the Human Genome , 2011, PLoS genetics.

[106]  Steven L Salzberg,et al.  Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype , 2019, Nature Biotechnology.

[107]  David Haussler,et al.  Linear assembly of a human centromere on the Y chromosome , 2018, Nature Biotechnology.

[108]  Colin N. Dewey,et al.  RNA-Seq gene expression estimation with read mapping uncertainty , 2009, Bioinform..

[109]  R. Slotkin,et al.  EpiTEome: Simultaneous detection of transposable element insertion sites and their DNA methylation levels , 2017, Genome Biology.

[110]  Erik Sundström,et al.  RNA velocity of single cells , 2018, Nature.

[111]  Yuliya V. Karpievitch,et al.  Population scale mapping of transposable element diversity reveals links to gene regulation and epigenomic variation , 2016, bioRxiv.

[112]  Juan M. Vaquerizas,et al.  Transposable elements drive reorganisation of 3D chromatin during early embryogenesis , 2019, bioRxiv.

[113]  Pawel Zajac,et al.  Highly multiplexed and strand-specific single-cell RNA 5′ end sequencing , 2012, Nature Protocols.

[114]  Ye Zheng,et al.  Generative modeling of multi-mapping reads with mHi-C advances analysis of Hi-C studies , 2019, eLife.

[115]  Junhui Wang,et al.  ConTEdb: a comprehensive database of transposable elements in conifers , 2018, Database J. Biol. Databases Curation.

[116]  Kazuki Kurimoto,et al.  SC3-seq: a method for highly parallel and quantitative measurement of single-cell gene expression , 2015, Nucleic acids research.

[117]  Ryuichiro Nakato,et al.  DROMPA: easy-to-handle peak calling and visualization software for the computational analysis and validation of ChIP-seq data , 2013, Genes to cells : devoted to molecular & cellular mechanisms.

[118]  S. Salzberg,et al.  Repetitive DNA and next-generation sequencing: computational challenges and solutions , 2011, Nature Reviews Genetics.

[119]  Laurent Modolo,et al.  TEtools facilitates big data expression analysis of transposable elements and reveals an antagonism between their activity and that of piRNA genes , 2016, Nucleic acids research.

[120]  Colin N. Dewey,et al.  Discovering Transcription Factor Binding Sites in Highly Repetitive Regions of Genomes with Multi-Read Analysis of ChIP-Seq Data , 2011, PLoS Comput. Biol..

[121]  C. Vollmers,et al.  Tn5Prime, a Tn5 based 5′ capture method for single cell RNA-seq , 2017, bioRxiv.

[122]  T. Wicker,et al.  TREP: a database for Triticeae repetitive elements , 2002 .

[123]  H. Ng,et al.  Dynamic transcription of distinct classes of endogenous retroviral elements marks specific populations of early human embryonic cells. , 2015, Cell stem cell.

[124]  Mikael Bodén,et al.  SCRAM: a pipeline for fast index‐free small RNA read alignment and visualization , 2018, Bioinform..

[125]  Robert D. Finn,et al.  The Dfam database of repetitive DNA families , 2015, Nucleic Acids Res..

[126]  J. V. Moran,et al.  Dynamic interactions between transposable elements and their hosts , 2011, Nature Reviews Genetics.

[127]  David R. Kelley,et al.  Transposable elements modulate human RNA abundance and splicing via specific RNA-protein interactions , 2014, Genome Biology.

[128]  Sündüz Keleş,et al.  Statistical analysis of ChIP-seq data with MOSAiCS. , 2013, Methods in molecular biology.

[129]  H. Quesneville,et al.  RepetDB: a unified resource for transposable element references , 2019, Mobile DNA.

[130]  Xiaohua Shen,et al.  A LINE1-Nucleolin Partnership Regulates Early Development and ESC Identity , 2018, Cell.

[131]  Sten Linnarsson,et al.  Alternative TSSs are co‐regulated in single cells in the mouse brain , 2017, Molecular systems biology.

[132]  Glenn Tesler,et al.  Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory , 2012, BMC Bioinformatics.

[133]  J. Nichols,et al.  Single cell transcriptome analysis of human, marmoset and mouse embryos reveals common and divergent features of preimplantation development , 2018, Development.

[134]  N. Neretti,et al.  Transcriptional landscape of repetitive elements in normal and cancer human cells , 2014, BMC Genomics.

[135]  Thomas D. Wu,et al.  GMAP and GSNAP for Genomic Sequence Alignment: Enhancements to Speed, Accuracy, and Functionality , 2016, Statistical Genomics.

[136]  Yi Xing,et al.  CLIP-seq analysis of multi-mapped reads discovers novel functional RNA regulatory sites in the human transcriptome , 2017, Nucleic acids research.

[137]  Sebastian D. Mackowiak,et al.  miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades , 2011, Nucleic acids research.

[138]  Gunnar Rätsch,et al.  MMR: a tool for read multi-mapper resolution , 2015, bioRxiv.

[139]  M. Nalls,et al.  Genome-Wide Association Study of Retinopathy in Individuals without Diabetes , 2013, PloS one.

[140]  Niranjan Nagarajan,et al.  Fast and sensitive mapping of nanopore sequencing reads with GraphMap , 2016, Nature Communications.

[141]  S. Boissinot,et al.  L1 (LINE-1) retrotransposon evolution and amplification in recent human history. , 2000, Molecular biology and evolution.

[142]  Victor X. Jin,et al.  LOcating Non-Unique matched Tags (LONUT) to Improve the Detection of the Enriched Regions for ChIP-seq Data , 2013, PloS one.

[143]  Ye Zheng,et al.  Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior-Enhanced Read Mapping , 2015, PLoS Comput. Biol..

[144]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[145]  J. V. Moran,et al.  Hot L1s account for the bulk of retrotransposition in the human population , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[146]  Braulio Valdebenito-Maturana,et al.  TEcandidates: prediction of genomic origin of expressed transposable elements using RNA‐seq data , 2018, Bioinform..

[147]  Pedro P. Rocha,et al.  Analysis of 3D genomic interactions identifies candidate host genes that transposable elements potentially regulate , 2018, Genome Biology.

[148]  M. Pellegrini,et al.  Whole-genome expression analysis of mammalian-wide interspersed repeat elements in human cell lines , 2016, DNA research : an international journal for rapid publication of reports on genes and genomes.

[149]  Junhui Wang,et al.  SPTEdb: a database for transposable elements in salicaceous plants , 2018, Database J. Biol. Databases Curation.

[150]  M. Axtell Butter: High-precision genomic alignment of small RNA-seq data , 2014, bioRxiv.

[151]  G. Bourque,et al.  Computational tools to unmask transposable elements , 2018, Nature Reviews Genetics.

[152]  G. Bourque,et al.  Identifying co‐opted transposable elements using comparative epigenomics , 2018, Development, growth & differentiation.

[153]  Anton J. Enright,et al.  Chimira: analysis of small RNA sequencing data and microRNA modifications , 2015, Bioinform..