Insights into mammalian TE diversity through the curation of 248 genome assemblies

We examined transposable element (TE) content of 248 placental mammal genome assemblies, the largest de novo TE curation effort in eukaryotes to date. We found that although mammals resemble one another in total TE content and diversity, they show substantial differences with regard to recent TE accumulation. This includes multiple recent expansion and quiescence events across the mammalian tree. Young TEs, particularly long interspersed elements, drive increases in genome size, whereas DNA transposons are associated with smaller genomes. Mammals tend to accumulate only a few types of TEs at any given time, with one TE type dominating. We also found association between dietary habit and the presence of DNA transposon invasions. These detailed annotations will serve as a benchmark for future comparative TE analyses among placental mammals. Description INTRODUCTION An estimated 160 million years have passed since the first placental mammals evolved. These eutherians are categorized into 19 orders consisting of nearly 4000 extant species, with ~70% being bats or rodents. Broad, in-depth, and comparative genomic studies across Eutheria have previously been unachievable because of the lack of genomic resources. The collaboration of the Zoonomia Consortium made available hundreds of high-quality genome assemblies for comparative analysis. Our focus within the consortium was to investigate the evolution of transposable elements (TEs) among placental mammals. Using these data, we identified previously known TEs, described previously unknown TEs, and analyzed the TE distribution among multiple taxonomic levels. RATIONALE The emergence of accurate and affordable sequencing technology has propelled efforts to sequence increasingly more nonmodel mammalian genomes in the past decade. Most of these efforts have traditionally focused on genic regions searching for patterns of selection or variation in gene regulation. The common trend of ignoring or trivializing TE annotation with newly published genomes has resulted in severe lag of TE analyses, leading to extensive undiscovered TE variation. This oversight has neglected an important source of evolution because the accumulation of TEs is attributable to drastic alterations in genome architecture, including insertions, deletions, duplications, translocations, and inversions. Our approach to the Zoonomia dataset was to provide future inquirers accurate and meticulous TE curations and to describe taxonomic variation among eutherians. RESULTS We annotated the TE content of 248 mammalian genome assemblies, which yielded a library of 25,676 consensus TE sequences, 8263 of which were previously unidentified TE sequences (available at https://dfam.org). We affirmed that the largest component of a typical mammalian genome is comprised of TEs (average 45.6%). Of the 248 assemblies, the lowest genomic percentage of TEs was found in the star-nosed mole (27.6%), and the largest percentage was seen in the aardvark (74.5%), whose increase in TE accumulation drove a corresponding increase in genome size—a correlation we observed across Eutheria. The overall genomic proportions of recently accumulated TEs were roughly similar across most mammals in the dataset, with a few notable exceptions (see the figure). Diversity of recently accumulated TEs is highest among multiple families of bats, mostly driven by substantial DNA transposon activity. Our data also exhibit an increase of recently accumulated DNA transposons among carnivore lineages over their herbivorous counterparts, which suggests that diet may play a role in determining the genomic content of TEs. CONCLUSION The copious TE data provided in this work emanated from the largest comprehensive TE curation effort to date. Considering the wide-ranging effects that TEs impose on genomic architecture, these data are an important resource for future inquiries into mammalian genomics and evolution and suggest avenues for continued study of these important yet understudied genomic denizens. Boxplots depicting the range of recently accumulated TEs among mammals (by proportion of genome). Five categories of TE were examined: DNA transposons, long interspersed elements (LINEs), long terminal repeat (LTR) retrotransposons, rolling circle (RC) transposons, and short interspersed elements (SINEs). Species with the highest and lowest proportions for each TE type are indicated by a picture of the organism and its common name. With regard to RC and DNA transposons, we found that most mammalian genome assemblies exhibit essentially zero recent accumulation (RC: 240 of 248 mammals had <0.1%; DNA: 210 of 248 mammals had <0.1%). ILLUSTRATIONS: BRITTANY ANN HALE

Voichita D. Marinescu | Andreas R. Pfenning | Matthew G. Johnson | Graham M. Hughes | BaDoi N. Phan | Irene M. Kaplow | Pardis C Sabeti | F. Di Palma | B. Birren | K. Lindblad-Toh | Z. Weng | M. Diekhans | K. Pollard | T. Marquès-Bonet | H. Clawson | B. Paten | O. Wallerman | W. Murphy | R. Hubley | E. Karlsson | E. Teeling | A. Navarro | G. Muntané | M. Springer | E. Eizirik | Jill E. Moore | S. Gazal | B. Shapiro | H. Lewin | Steven K. Reilly | Oliver A. Ryder | D. Ray | Jason Turner-Maier | C. Steiner | Jeremy Johnson | K. Fan | J. Meadows | Diana D. Moreno-Santillán | S. Kozyrev | L. Dávalos | M. Christmas | K. Koepfli | Morgan E. Wirthlin | Ross Swofford | G. Hickey | Abigail L. Lind | Joana Damas | Kathleen Morrill | Nicole M. Foley | J. Gatesy | R. Stevens | Alyssa J. Lawler | Joy-El R B Talbot | T. Lehmann | P. Sullivan | Kathleen C. Keough | K. Forsberg-Nilsson | L. Densmore | D. Genereux | Chaitanya Srinivasan | E. Sundström | Daniel E. Schäffer | David Juan | M. Nweeia | B. Kirilenko | S. Ortmann | Arian F. A. Smit | Aryn P. Wilder | Aitor Serres | Carlos J. Garcia | Juehan Wang | Chao Wang | I. Ruf | A. Valenzuela | Jessica M. Storer | M. Bianchi | Amanda Kowalczyk | C. Lawless | Xue Li | D. Levesque | Xiaomeng Zhang | Wynn K. Meyer | Jeb Rosen | A. Breit | Victor C. Mason | Andrew J. Harris | K. Bredemeyer | Nicole S. Paulat | Austin B. Osmanski | Michael Hiller | L. R. Moreira | Megan A. Supple | J. Korstian | Franziska Wagner | Ava Mackay-Smith | Jenna R. Grimshaw | Michaela K. Halsey | Kevin A. M. Sullivan | Carlos Garcia | H. Pratt | Allyson Hindle | Louise Ryan | Linda Goodman | Michael X. Dong | Joel C. Armstrong | Claudia Crookshanks | Jacquelyn Roberts | James R. Xue | Gregory Andrews | Cornelia Fanter

[1]  W. Murphy,et al.  A genomic timescale for placental mammal evolution , 2022, bioRxiv.

[2]  OUP accepted manuscript , 2022, Molecular Biology And Evolution.

[3]  T. Macfarlan,et al.  Transposable elements shape the evolution of mammalian development , 2021, Nature Reviews Genetics.

[4]  M. Blaxter,et al.  Launching the Tree of Life Gateway , 2021, Wellcome open research.

[5]  Stanley K. Sessions,et al.  Gigantic Genomes Provide Empirical Tests of Transposable Element Dynamics Models , 2021, Genom. Proteom. Bioinform..

[6]  A. Smit,et al.  The Dfam community resource of transposable element families, sequence models, and genome annotations , 2021, Mobile DNA.

[7]  M. Badawi,et al.  Screening of Helicoverpa armigera Mobilome Revealed Transposable Element Insertions in Insecticide Resistance Genes , 2020, Insects.

[8]  Voichita D. Marinescu,et al.  A comparative genomics multitool for scientific discovery and conservation , 2020, Nature.

[9]  A. Clark,et al.  The evolutionary arms race between transposable elements and piRNAs in Drosophila melanogaster , 2020, BMC Evolutionary Biology.

[10]  Jonah Gabry,et al.  R-squared for Bayesian Regression Models , 2019, The American Statistician.

[11]  G. Bourque,et al.  Ten things you should know about transposable elements , 2018, Genome Biology.

[12]  R. Kofler Dynamics of Transposable Element Invasions with piRNA Clusters , 2018, bioRxiv.

[13]  I. Arkhipova Neutral Theory, Transposable Elements, and Eukaryotic Genome Evolution , 2018, Molecular biology and evolution.

[14]  C. Schlötterer,et al.  Molecular dissection of a natural transposable element invasion , 2018, Genome research.

[15]  C. Feschotte,et al.  Horizontal acquisition of transposable elements and viral sequences: patterns and consequences. , 2018, Current opinion in genetics & development.

[16]  M Thomas P Gilbert,et al.  Bat Biology, Genomes, and the Bat1K Project: To Generate Chromosome-Level Genomes for All Living Bat Species. , 2018, Annual review of animal biosciences.

[17]  D. Ray,et al.  Mammalian transposable elements and their impacts on genome evolution , 2018, Chromosome Research.

[18]  Paul-Christian Bürkner,et al.  brms: An R Package for Bayesian Multilevel Models Using Stan , 2017 .

[19]  C. Feschotte,et al.  Dynamics of genome size evolution in birds and mammals , 2017, Proceedings of the National Academy of Sciences.

[20]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[21]  M. Quail,et al.  The industrial melanism mutation in British peppered moths is a transposable element , 2016, Nature.

[22]  D. Ray,et al.  Accurate Transposable Element Annotation Is Vital When Analyzing New Genome Assemblies , 2016, Genome biology and evolution.

[23]  O. Kohany,et al.  Repbase Update, a database of repetitive elements in eukaryotic genomes , 2015, Mobile DNA.

[24]  Tyler A. Elliott,et al.  Do larger genomes contain more diverse transposable elements? , 2015, BMC Evolutionary Biology.

[25]  Bronwen L. Aken,et al.  Analyses of pig genomes provide insight into porcine demography and evolution , 2012, Nature.

[26]  J. Jurka,et al.  Families of transposable elements, population structure and the origin of species , 2011, Biology Direct.

[27]  J. V. Moran,et al.  LINE-1 elements in structural variation and disease. , 2011, Annual review of genomics and human genetics.

[28]  Susan J. Brown,et al.  Creating a buzz about insect genomes. , 2011, Science.

[29]  Robert C. Edgar,et al.  Interspersed repeats in the horse (Equus caballus); spatial correlations highlight conserved chromosomal domains. , 2010, Animal genetics.

[30]  J. Hadfield,et al.  General quantitative genetic methods for comparative biology: phylogenies, taxonomies and multi‐trait models for continuous and categorical characters , 2010, Journal of evolutionary biology.

[31]  Joshua M. Stuart,et al.  Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species. , 2009, The Journal of heredity.

[32]  Samuel Venner,et al.  Dynamics of transposable elements: towards a community ecology of the genome. , 2009, Trends in genetics : TIG.

[33]  Toni Gabaldón,et al.  trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses , 2009, Bioinform..

[34]  György Abrusán,et al.  TEclass - a tool for automated classification of unknown eukaryotic transposable elements , 2009, Bioinform..

[35]  T. Eickbush,et al.  The diversity of retrotransposons and the properties of their reverse transcriptases. , 2008, Virus research.

[36]  Jean L. Chang,et al.  Initial sequence and comparative analysis of the cat genome. , 2007, Genome research.

[37]  C. Feschotte,et al.  Mavericks, a novel class of giant transposable elements widespread in eukaryotes and related to DNA viruses. , 2007, Gene.

[38]  M. Batzer,et al.  Emergence of primate genes by retrotransposon-mediated sequence transduction , 2006, Proceedings of the National Academy of Sciences.

[39]  J. Jurka,et al.  Self-synthesizing DNA transposons in eukaryotes. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[40]  P. Capy,et al.  The First Steps of Transposable Elements Invasion , 2005, Genetics.

[41]  A. Gelman Discussion of "Analysis of variance--why it is more important than ever" by A. Gelman , 2005, math/0508530.

[42]  P. Deininger,et al.  Tandem insertions of Alu elements , 2004, Cytogenetic and Genome Research.

[43]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[44]  Jianxin Ma,et al.  Analyses of LTR-retrotransposon structures reveal recent and rapid genomic DNA loss in rice. , 2004, Genome research.

[45]  Lisa M. D'Souza,et al.  Genome sequence of the Brown Norway rat yields insights into mammalian evolution , 2004, Nature.

[46]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[47]  E. Kirkness,et al.  The Dog Genome: Survey Sequencing and Comparative Analysis , 2003, Science.

[48]  Sudhir Kumar,et al.  Mutation rates in mammalian genomes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[50]  M. Boguski,et al.  Frequent human genomic DNA transduction driven by LINE-1 retrotransposition. , 2000, Genome research.

[51]  Thierry Heidmann,et al.  Human LINE retrotransposons generate processed pseudogenes , 2000, Nature Genetics.

[52]  E. Ostertag,et al.  Transduction of 3'-flanking sequences is common in L1 retrotransposition. , 2000, Human molecular genetics.

[53]  M. Churchill,et al.  A purified mariner transposase is sufficient to mediate transposition in vitro. , 1996 .

[54]  E. C. Pielou The measurement of diversity in different types of biological collections , 1966 .

[55]  B. Mcclintock The origin and behavior of mutable loci in maize , 1950, Proceedings of the National Academy of Sciences.