The UCSC repeat browser allows discovery and visualization of evolutionary conflict across repeat families

Background Nearly half the human genome consists of repeat elements, most of which are retrotransposons, and many of these sequences play important biological roles. However repeat elements pose several unique challenges to current bioinformatic analyses and visualization tools, as short repeat sequences can map to multiple genomic loci resulting in their misclassification and misinterpretation. In fact, sequence data mapping to repeat elements are often discarded from analysis pipelines. Therefore, there is a continued need for standardized tools and techniques to interpret genomic data of repeats. Results We present the UCSC Repeat Browser, which consists of a complete set of human repeat reference sequences derived from the gold standard repeat database RepeatMasker. The UCSC Repeat Browser contains mapped annotations from the human genome to these references, and presents all of them as a comprehensive interface to facilitate work with repetitive elements. Furthermore, it provides processed tracks of multiple publicly available datasets of biological interest to the repeat community, including ChIP-SEQ datasets for KRAB Zinc Finger Proteins (KZNFs) – a family of proteins known to bind and repress certain classes of repeats. Here we show how the UCSC Repeat Browser in combination with these datasets, as well as RepeatMasker annotations in several non-human primates, can be used to trace the independent trajectories of species-specific evolutionary conflicts. Conclusions The UCSC Repeat Browser allows easy and intuitive visualization of genomic data on consensus repeat elements, circumventing the problem of multi-mapping, in which sequencing reads of repeat elements map to multiple locations on the human genome. By developing a reference consensus, multiple datasets and annotation tracks can easily be overlaid to reveal complex evolutionary histories of repeats in a single interactive window. Specifically, we use this approach to retrace the history of several primate specific LINE-1 families across apes, and discover several species-specific routes of evolution that correlate with the emergence and binding of KZNFs.

[1]  Miriam K. Konkel,et al.  LINEs and SINEs of primate evolution , 2010, Evolutionary anthropology.

[2]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[3]  Hideaki Sugawara,et al.  The Sequence Read Archive , 2010, Nucleic Acids Res..

[4]  P. Bieniasz,et al.  Reconstitution of an Infectious Human Endogenous Retrovirus , 2007, PLoS pathogens.

[5]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[6]  Robert D. Finn,et al.  The Dfam database of repetitive DNA families , 2015, Nucleic Acids Res..

[7]  David Haussler,et al.  The UCSC Known Genes , 2006, Bioinform..

[8]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[9]  Stéphane Boissinot,et al.  Molecular evolution and tempo of amplification of human LINE-1 retrotransposons since the origin of primates. , 2005, Genome research.

[10]  Zhandong Liu,et al.  An ultra-fast and scalable quantification pipeline for transposable elements from next generation sequencing data , 2018, PSB.

[11]  Alexander G Williams,et al.  Transposable element expression in tumors is associated with immune infiltration and increased antigenicity , 2019, Nature Communications.

[12]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[13]  Ituro Inoue,et al.  Systematic identification and characterization of regulatory elements derived from human endogenous retroviruses , 2017, PLoS genetics.

[14]  Webb Miller,et al.  Mobile DNA in Old World Monkeys: A Glimpse Through the Rhesus Macaque Genome , 2007, Science.

[15]  John A.G. Briggs,et al.  The Neuronal Gene Arc Encodes a Repurposed Retrotransposon Gag Protein that Mediates Intercellular RNA Transfer , 2018, Cell.

[16]  Ying Jin,et al.  TEtranscripts: a package for including transposable elements in differential expression analysis of RNA-seq datasets , 2015, Bioinform..

[17]  D. Trono,et al.  KRAB zinc-finger proteins contribute to the evolution of gene regulatory networks , 2017, Nature.

[18]  Mihai Albu,et al.  C2H2 zinc finger proteins greatly expand the human regulatory lexicon , 2015, Nature Biotechnology.

[19]  J. Boeke,et al.  Transcription factor profiling reveals molecular choreography and key regulators of human retrotransposon expression , 2018, Proceedings of the National Academy of Sciences.

[20]  Washington Seattle An integrated encyclopedia of DNA elements in the human genome , 2016 .

[21]  R. Slotkin,et al.  The case for not masking away repetitive DNA , 2018, Mobile DNA.

[22]  Yun Ding,et al.  Natural courtship song variation caused by an intronic retroelement in an ion channel gene , 2016, Nature.

[23]  Andrew Emili,et al.  Multiparameter functional diversity of human C2H2 zinc finger proteins , 2016, Genome research.

[24]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[25]  C. Feschotte,et al.  Regulatory activities of transposable elements: from conflicts to benefits , 2016, Nature Reviews Genetics.

[26]  David Haussler,et al.  KRAB Zinc Finger Proteins coordinate across evolutionary time scales to battle retroelements , 2018, bioRxiv.

[27]  J. V. Moran,et al.  Spliced integrated retrotransposed element (SpIRE) formation in the human genome , 2018, PLoS biology.

[28]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[29]  Mary Goldman,et al.  The UCSC Genome Browser database: update 2011 , 2010, Nucleic Acids Res..

[30]  David Haussler,et al.  An evolutionary arms race between KRAB zinc finger genes 91/93 and SVA/L1 retrotransposons , 2014, Nature.

[31]  Jonas Blomberg,et al.  Sequence Variability, Gene Structure, and Expression of Full-Length Human Endogenous Retrovirus H , 2005, Journal of Virology.

[32]  Cédric Feschotte,et al.  Erratum: The Neuronal Gene Arc Encodes a Repurposed Retrotransposon Gag Protein that Mediates Intercellular RNA Transfer (Cell (2018) 172(1-2) (275–288.e18) (S0092867417315040) (10.1016/j.cell.2017.12.024)) , 2018 .

[33]  Aleksandar Milosavljevic,et al.  Prototypic sequences for human repetitive DNA , 1992, Journal of Molecular Evolution.

[34]  David Haussler,et al.  The UCSC Genome Browser database: 2019 update , 2018, Nucleic Acids Res..

[35]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[36]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[37]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[38]  G. Bourque,et al.  Computational tools to unmask transposable elements , 2018, Nature Reviews Genetics.

[39]  James H. Thomas,et al.  Coevolution of retroelements and tandem zinc finger genes. , 2011, Genome research.

[40]  O. Kohany,et al.  Repbase Update, a database of repetitive elements in eukaryotic genomes , 2015, Mobile DNA.

[41]  Jerzy Jurka,et al.  Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor , 2006, BMC Bioinformatics.

[42]  Ernest Fraenkel,et al.  Postmortem Cortex Samples Identify Distinct Molecular Subtypes of ALS: Retrotransposon Activation, Oxidative Stress, and Activated Glia , 2019, bioRxiv.

[43]  By Michael Marron-stearns 2019 Update : What to , 2019 .

[44]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.