Quantification and modeling of turnover dynamics of de novo transcripts in Drosophila melanogaster

Most of the transcribed eukaryotic genomes are composed of non-coding transcripts. Among these transcripts, some are newly transcribed when compared to outgroups and are referred to as de novo transcripts. De novo transcripts have been shown to play a major role in de novo gene emergence. However, little is known about the rates at which de novo transcripts are gained and lost in individuals of the same species. Here, we address this gap and estimate for the first time the de novo transcript turnover rate. We use DNA long reads and RNA short reads from seven samples of inbred individuals of Drosophila melanogaster to detect de novo transcripts that are (transiently) gained on a short evolutionary time scale. Overall, each sampled individual contains between 2,320 and 2,809 unspliced de novo transcripts with most of them being sample specific. We estimate that around 0.15 transcripts are gained per year, and that each gained transcript is lost at a rate around 5×10−5 per year. This high turnover of transcripts suggests frequent exploration of new genomic sequences within species. These rates provide first empirical estimates to better predict and comprehend the process of de novo gene birth.

[1]  E. Bornberg-Bauer,et al.  Neutral Models of De Novo Gene Emergence Suggest that Gene Evolution has a Preferred Trajectory , 2023, bioRxiv.

[2]  J. Parsch,et al.  Population genomics reveals mechanisms and dynamics of de novo proto-gene emergence in Drosophila melanogaster , 2022, bioRxiv.

[3]  Emile G Magny,et al.  Translation and natural selection of micropeptides from long non-canonical RNAs , 2022, Nature Communications.

[4]  E. Bornberg-Bauer,et al.  New Genomic Signals Underlying the Emergence of Human Proto-Genes , 2022, bioRxiv.

[5]  Andrew D. Smith,et al.  Falco: high-speed FastQC emulation for quality control of sequencing data , 2021, F1000Research.

[6]  Thomas M. Keane,et al.  Twelve years of SAMtools and BCFtools , 2020, GigaScience.

[7]  E. Koonin,et al.  Functional Long Non-coding RNAs Evolve from Junk Transcripts , 2020, Cell.

[8]  Maite G. Barrón,et al.  Genomic Analysis of European Drosophila melanogaster Populations Reveals Longitudinal Structure, Continent-Wide Selection, and Previously Unknown DNA Viruses , 2020, Molecular biology and evolution.

[9]  Erich Bornberg-Bauer,et al.  A Continuum of Evolving De Novo Genes Drives Protein-Coding Novelty in Drosophila , 2020, Journal of Molecular Evolution.

[10]  Catherine Winder,et al.  Team Guiding Production of Volume 1 , 2005 .

[11]  S. Kelly,et al.  OrthoFinder: phylogenetic orthology inference for comparative genomics , 2019, Genome Biology.

[12]  Andrew D. Smith,et al.  Falco: high-speed FastQC emulation for quality control of sequencing data. , 2019, F1000Research.

[13]  Astrid Gall,et al.  Ensembl 2020 , 2019, Nucleic Acids Res..

[14]  Steven L Salzberg,et al.  Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype , 2019, Nature Biotechnology.

[15]  L. Vitale,et al.  Human protein-coding genes and gene feature statistics in 2019 , 2019, BMC Research Notes.

[16]  Anne-Ruxandra Carvunis,et al.  De novo gene birth , 2019, PLoS genetics.

[17]  E. Furlong,et al.  The role of transcription in shaping the spatial organization of the genome , 2019, Nature Reviews Molecular Cell Biology.

[18]  Zhiyu Peng,et al.  Rapid evolution of protein diversity by de novo origination in Oryza , 2019, Nature Ecology & Evolution.

[19]  S. Bhaumik,et al.  Mechanisms of Antisense Transcription Initiation with Implications in Gene Expression, Genomic Integrity and Disease Pathogenesis , 2019, Non-coding RNA.

[20]  G. Bourque,et al.  Ten things you should know about transposable elements , 2018, Genome Biology.

[21]  E. Bornberg-Bauer,et al.  Incipient de novo genes can evolve from frozen accidents that escaped rapid transcript turnover , 2018, Nature Ecology & Evolution.

[22]  D. Libri,et al.  Pervasive transcription fine-tunes replication origin activity , 2018, bioRxiv.

[23]  Ernst Houtgast,et al.  Hardware acceleration of BWA-MEM genomic short read mapping for longer read lengths , 2018, Comput. Biol. Chem..

[24]  D. Libri,et al.  High‐resolution transcription maps reveal the widespread impact of roadblock termination in yeast , 2018, The EMBO journal.

[25]  F. Ariel,et al.  Splicing regulation by long noncoding RNAs , 2018, Nucleic acids research.

[26]  J. Couso,et al.  Classification and function of small open reading frames , 2017, Nature Reviews Molecular Cell Biology.

[27]  E. Bornberg-Bauer,et al.  Fact or fiction: updates on how protein-coding genes might emerge de novo from previously non-coding DNA , 2017, F1000Research.

[28]  John Chilton,et al.  The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update , 2016, Nucleic Acids Res..

[29]  Yali Xue,et al.  BCFtools/RoH: a hidden Markov model approach for detecting autozygosity from next-generation sequencing data , 2016, Bioinform..

[30]  M. Carmo-Fonseca,et al.  Pervasive transcription read-through promotes aberrant expression of oncogenes and RNA chimeras in renal carcinoma , 2015, eLife.

[31]  J. Steitz,et al.  Widespread Inducible Transcription Downstream of Human Genes. , 2015, Molecular cell.

[32]  Martin S. Taylor,et al.  The frequent evolutionary birth and death of functional promoters in mouse and human , 2015, Genome research.

[33]  Thomas Bonfert,et al.  Widespread disruption of host transcription termination in HSV-1 infection , 2015, Nature Communications.

[34]  E. Stone,et al.  Genetic basis of transcriptome diversity in Drosophila melanogaster , 2015, Proceedings of the National Academy of Sciences.

[35]  Christian Schlötterer,et al.  Genes from scratch – the evolutionary fate of de novo genes , 2015, Trends in genetics : TIG.

[36]  S. Salzberg,et al.  StringTie enables improved reconstruction of a transcriptome from RNA-seq reads , 2015, Nature Biotechnology.

[37]  D. Libri,et al.  Transcription termination and the control of the transcriptome: why, where and how to stop , 2015, Nature Reviews Molecular Cell Biology.

[38]  J. T. Erichsen,et al.  Enhancer Evolution across 20 Mammalian Species , 2015, Cell.

[39]  Alexander F. Palazzo,et al.  Non-coding RNA: what is functional and what is junk? , 2015, Front. Genet..

[40]  Fidencio J. Neri,et al.  Mouse regulatory DNA landscapes reveal global principles of cis-regulatory evolution , 2014, Science.

[41]  Michael D. Wilson,et al.  Multi-species, multi-transcription factor binding highlights conserved control of tissue-specific biological pathways , 2014, eLife.

[42]  C. Feschotte,et al.  Volatile evolution of long noncoding RNA repertoires: mechanisms and biological implications. , 2014, Trends in genetics : TIG.

[43]  Ying Chen Eyre-Walker,et al.  Extensive translation of small Open Reading Frames revealed by Poly-Ribo-Seq , 2014, eLife.

[44]  J. Wade,et al.  Pervasive transcription: illuminating the dark matter of bacterial transcriptomes , 2014, Nature Reviews Microbiology.

[45]  M. Albà,et al.  Long non-coding RNAs as a source of new peptides , 2014, eLife.

[46]  Nikolaus Rajewsky,et al.  Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation , 2014, The EMBO journal.

[47]  Morgan C. Giddings,et al.  Defining functional DNA elements in the human genome , 2014, Proceedings of the National Academy of Sciences.

[48]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[49]  D. Odom,et al.  Evolution of transcription factor binding in metazoans — mechanisms and functional implications , 2014, Nature Reviews Genetics.

[50]  Li Zhao,et al.  Origin and Spread of de Novo Genes in Drosophila melanogaster Populations , 2014, Science.

[51]  J. Wade,et al.  Widespread suppression of intragenic transcription initiation by H-NS , 2014, Genes & development.

[52]  James B. Brown,et al.  Diversity and dynamics of the Drosophila transcriptome , 2014, Nature.

[53]  Frank Grützner,et al.  The evolution of lncRNA repertoires and expression patterns in tetrapods , 2014, Nature.

[54]  L. Steinmetz,et al.  Gene regulation by antisense transcription , 2013, Nature Reviews Genetics.

[55]  Josephine A. Reinhardt,et al.  De Novo ORFs in Drosophila Are Important to Organismal Fitness and Evolved Rapidly from Previously Non-coding Sequences , 2013, PLoS genetics.

[56]  Laura E. DeMare,et al.  The Evolution of Lineage-Specific Regulatory Activities in the Human Embryonic Limb , 2013, Cell.

[57]  Michael T. McManus,et al.  Pervasive Transcription of the Human Genome Produces Thousands of Previously Unidentified Long Intergenic Noncoding RNAs , 2013, PLoS genetics.

[58]  Manolis Kellis,et al.  Evidence of Abundant Purifying Selection in Humans for Recently Acquired Regulatory Functions , 2012, Science.

[59]  Chris P. Ponting,et al.  Rapid Turnover of Long Noncoding RNAs and the Evolution of Gene Expression , 2012, PLoS genetics.

[60]  P. Higgs,et al.  Testing the infinitely many genes model for the evolution of the bacterial core genome and pangenome. , 2012, Molecular biology and evolution.

[61]  César A. Hidalgo,et al.  Proto-genes and de novo gene birth , 2012, Nature.

[62]  Daniel MacLean,et al.  Bio-samtools: Ruby bindings for SAMtools, a library for accessing BAM files containing high-throughput sequence alignments , 2012, Source Code for Biology and Medicine.

[63]  O. Slabý,et al.  Novel classes of non-coding RNAs and cancer , 2012, Journal of Translational Medicine.

[64]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[65]  B. Schwikowski,et al.  Condition-Dependent Transcriptome Reveals High-Level Regulatory Architecture in Bacillus subtilis , 2012, Science.

[66]  Wolfgang R. Hess,et al.  The Infinitely Many Genes Model for the Distributed Genome of Bacteria , 2012, Genome biology and evolution.

[67]  Nayun Kim,et al.  Transcription as a source of genome instability , 2012, Nature Reviews Genetics.

[68]  P. Wittkopp,et al.  Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence , 2011, Nature Reviews Genetics.

[69]  T. Gingeras,et al.  Genome-wide antisense transcription drives mRNA processing in bacteria , 2011, Proceedings of the National Academy of Sciences.

[70]  Howard Y. Chang,et al.  Molecular mechanisms of long noncoding RNAs. , 2011, Molecular cell.

[71]  L. Excoffier,et al.  Approximate Bayesian analysis of Drosophila melanogaster polymorphism data reveals a recent colonization of Southeast Asia. , 2011, Molecular biology and evolution.

[72]  Paulo P. Amaral,et al.  The Reality of Pervasive Transcription , 2011, PLoS biology.

[73]  N. Friedman,et al.  Trinity : reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2016 .

[74]  Howard Y. Chang,et al.  A long noncoding RNA maintains active chromatin to coordinate homeotic gene expression , 2011, Nature.

[75]  Peter R Cook,et al.  Genome architecture and the role of transcription , 2010, Current opinion in cell biology.

[76]  A. Jacquier The complex eukaryotic transcriptome: unexpected pervasive transcription and novel small RNAs , 2009, Nature Reviews Genetics.

[77]  W. Hess,et al.  The diversity of a distributed genome in bacterial populations , 2009, 0907.2572.

[78]  Fred L. Drake,et al.  Python 3 Reference Manual , 2009 .

[79]  J. Mattick,et al.  Long non-coding RNAs: insights into functions , 2009, Nature Reviews Genetics.

[80]  C. Ponting,et al.  Evolution and Functions of Long Noncoding RNAs , 2009, Cell.

[81]  K. Kinzler,et al.  The Antisense Transcriptomes of Human Cells , 2008, Science.

[82]  J. Wakeley Coalescent Theory: An Introduction , 2008 .

[83]  D. Gifford,et al.  Tissue-specific transcriptional regulation has diverged significantly between human and mouse , 2007, Nature Genetics.

[84]  M. Albà,et al.  On homology searches by protein Blast and the characterization of the age of genes , 2007, BMC Evolutionary Biology.

[85]  W. Stephan,et al.  Inferring the Demographic History and Rate of Adaptive Substitution in Drosophila , 2006, PLoS genetics.

[86]  Jun Kawai,et al.  Evolutionary turnover of mammalian transcription start sites. , 2006, Genome research.

[87]  S. Batalov,et al.  Antisense Transcription in the Mammalian Transcriptome , 2005, Science.

[88]  M. Feder,et al.  Reverse transcriptional profiling: non-correspondence of transcript level variation and proximal promoter polymorphism , 2005, BMC Genomics.

[89]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[90]  G. Pertea,et al.  GFF Utilities: GffRead and GffCompare. , 2020, F1000Research.

[91]  P. Bickel,et al.  Diversity and dynamics of the Drosophila , 2014 .

[92]  Ira M. Hall,et al.  BEDTools: a flexible suite of utilities for comparing genomic features , 2010, Bioinform..

[93]  C. Farr,et al.  Drosophila melanogaster as a model system to study mitochondrial biology. , 2007, Methods in molecular biology.

[94]  G. McVean,et al.  The coalescent , 2022 .