Analysis of variable retroduplications in human populations suggests coupling of retrotransposition to cell division

In primates and other animals, reverse transcription of mRNA followed by genomic integration creates retroduplications. Expressed retroduplications are either "retrogenes" coding for functioning proteins, or expressed "processed pseudogenes," which can function as noncoding RNAs. To date, little is known about the variation in retroduplications in terms of their presence or absence across individuals in the human population. We have developed new methodologies that allow us to identify "novel" retroduplications (i.e., those not present in the reference genome), to find their insertion points, and to genotype them. Using these methods, we catalogued and analyzed 174 retroduplication variants in almost one thousand humans, which were sequenced as part of Phase 1 of The 1000 Genomes Project Consortium. The accuracy of our data set was corroborated by (1) multiple lines of sequencing evidence for retroduplication (e.g., depth of coverage in exons vs. introns), (2) experimental validation, and (3) the fact that we can reconstruct a correct phylogenetic tree of human subpopulations based solely on retroduplications. We also show that parent genes of retroduplication variants tend to be expressed at the M-to-G1 transition in the cell cycle and that M-to-G1 expressed genes have more copies of fixed retroduplications than genes expressed at other times. These findings suggest that cell division is coupled to retrotransposition and, perhaps, is even a requirement for it.

[1]  Matthew W. Hahn,et al.  Gene Copy-Number Polymorphism Caused by Retrotransposition in Humans , 2013, PLoS genetics.

[2]  Li Ding,et al.  Retrotransposition of gene transcripts leads to structural variation in mammalian genomes , 2013, Genome Biology.

[3]  Z. Izsvák,et al.  Cell division promotes efficient retrotransposition in a stable L1 reporter cell line , 2013, Mobile DNA.

[4]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[5]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[6]  M. Gerstein,et al.  The GENCODE pseudogene resource , 2012, Genome Biology.

[7]  David G. Knowles,et al.  The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression , 2012, Genome research.

[8]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[9]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[10]  D. C. Hancks,et al.  Active human retrotransposons: variation and disease. , 2012, Current opinion in genetics & development.

[11]  M. Rieder,et al.  Detection of structural variants and indels within exome data , 2011, Nature Methods.

[12]  Kristian Stevens,et al.  Genome-wide analysis of retrogene polymorphisms in Drosophila melanogaster. , 2011, Genome research.

[13]  M. Batzer,et al.  Repetitive Elements May Comprise Over Two-Thirds of the Human Genome , 2011, PLoS genetics.

[14]  P. Pandolfi,et al.  A ceRNA Hypothesis: The Rosetta Stone of a Hidden RNA Language? , 2011, Cell.

[15]  Adrian M. Stütz,et al.  A Comprehensive Map of Mobile Element Insertion Polymorphisms in Humans , 2011, PLoS genetics.

[16]  Hugo Y. K. Lam,et al.  Identification of genomic indels and structural variations using split reads , 2011, BMC Genomics.

[17]  M. Gerstein,et al.  CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. , 2011, Genome research.

[18]  Deniz Yorukoglu,et al.  Alu repeat discovery and characterization within human genomes. , 2011, Genome research.

[19]  F. Ayala,et al.  Pseudogene-derived small interference RNAs regulate gene expression in African Trypanosoma brucei , 2011, Proceedings of the National Academy of Sciences.

[20]  Joshua M. Korn,et al.  Discovery and genotyping of genome structural polymorphism by sequencing on a population scale , 2011, Nature Genetics.

[21]  Mark Gerstein,et al.  AGE: defining breakpoints of genomic structural variants at single-nucleotide resolution, through optimal alignments with gap excision , 2011, Bioinform..

[22]  Albert J. Vilella,et al.  Comparative and demographic analysis of orang-utan genomes , 2011, Nature.

[23]  Kenny Q. Ye,et al.  Mapping copy number variation by population scale genome sequencing , 2010, Nature.

[24]  Mark Gerstein,et al.  Bioinformatics Applications Note Gene Expression Rseqtools: a Modular Framework to Analyze Rna-seq Data Using Compact, Anonymized Data Summaries , 2022 .

[25]  H. Kazazian,et al.  High-throughput sequencing reveals extensive variation in human-specific L1 content in individual human genomes. , 2010, Genome research.

[26]  Jinchuan Xing,et al.  Mobile element scanning (ME-Scan) by targeted high-throughput sequencing , 2010, BMC Genomics.

[27]  D. Valle,et al.  Mobile Interspersed Repeats Are Major Structural Variants in the Human Genome , 2010, Cell.

[28]  Andrew F. Neuwald,et al.  Natural Mutagenesis of Human Genomes by Endogenous Retrotransposons , 2010, Cell.

[29]  P. Pandolfi,et al.  A coding-independent function of gene and pseudogene mRNAs regulates tumour biology , 2010, Nature.

[30]  Faraz Hach,et al.  Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery , 2010, Bioinform..

[31]  Inanç Birol,et al.  Detection and characterization of novel sequence insertions using paired-end next-generation sequencing , 2010, Bioinform..

[32]  Tomas W. Fitzgerald,et al.  Origins and functional impact of copy number variation in the human genome , 2010, Nature.

[33]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[34]  Søren Brunak,et al.  Cyclebase.org: version 2.0, an updated comprehensive, multi-species repository of cell cycle experiments and derived analysis results , 2009, Nucleic Acids Res..

[35]  Aristotelis Tsirigos,et al.  Alu and B1 Repeats Have Been Selectively Retained in the Upstream and Intronic Regions of Genes of Specific Functional Classes , 2009, PLoS Comput. Biol..

[36]  M. Batzer,et al.  The impact of retrotransposons on human genome evolution , 2009, Nature Reviews Genetics.

[37]  R. Wilson,et al.  BreakDancer: An algorithm for high resolution mapping of genomic structural variation , 2009, Nature Methods.

[38]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[39]  C. Alkan,et al.  MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions , 2009, Nature Methods.

[40]  Ali Bashir,et al.  A geometric approach for classification and comparison of structural variants , 2009, Bioinform..

[41]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[42]  Mark Gerstein,et al.  MSB: a mean-shift-based approach for the analysis of structural variation in the genome. , 2008, Genome research.

[43]  N. Vinckenbosch,et al.  RNA-based gene duplication: mechanistic and evolutionary insights , 2009, Nature Reviews Genetics.

[44]  M. Gerstein,et al.  PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data , 2009, Genome Biology.

[45]  D. Haussler,et al.  Retrocopy contributions to the evolution of the human genome , 2008, BMC Genomics.

[46]  Mark Gerstein,et al.  Genomics: Protein fossils live on as RNA , 2008, Nature.

[47]  Y. Sakaki,et al.  Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes , 2008, Nature.

[48]  Oliver H. Tam,et al.  Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes , 2008, Nature.

[49]  I. Simon,et al.  Genome-wide transcriptional analysis of the human cell cycle identifies genes differentially regulated in normal and cancer cells , 2008, Proceedings of the National Academy of Sciences.

[50]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[51]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[52]  David N. Messina,et al.  Evolutionary and Biomedical Insights from the Rhesus Macaque Genome , 2007, Science.

[53]  S. Goff,et al.  Host factors exploited by retroviruses , 2007, Nature Reviews Microbiology.

[54]  Kanako O. Koyanagi,et al.  Frequent emergence and functional resurrection of processed pseudogenes in the human and mouse genomes. , 2007, Gene.

[55]  E. Betrán,et al.  Comparative genomics reveals a constant rate of origination and convergent acquisition of functional retrogenes in Drosophila , 2007, Genome Biology.

[56]  Xi Shi,et al.  Cell Divisions Are Required for L1 Retrotransposition , 2006, Molecular and Cellular Biology.

[57]  J. V. Moran,et al.  L1 retrotransposition in nondividing and primary human somatic cells. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[58]  N. Vinckenbosch,et al.  Evolutionary fate of retroposed gene copies in the human genome. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[59]  R. Russell,et al.  Animal MicroRNAs Confer Robustness to Gene Expression and Have a Significant Impact on 3′UTR Evolution , 2005, Cell.

[60]  A. Reymond,et al.  Emergence of Young Human Genes after a Burst of Retroposition in Primates , 2005, PLoS biology.

[61]  Fabien Burki,et al.  Birth and adaptive evolution of a hominoid gene that supports high neurotransmitter flux , 2004, Nature Genetics.

[62]  S. Nisole,et al.  Early steps of retrovirus replicative cycle , 2004, Retrovirology.

[63]  M. Gerstein,et al.  Comparative analysis of processed pseudogenes in the mouse and human genomes. , 2004, Trends in genetics : TIG.

[64]  M. Long,et al.  Extensive Gene Traffic on the Mammalian X Chromosome , 2004, Science.

[65]  Mark Gerstein,et al.  Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome. , 2003, Genome research.

[66]  Kevin R. Thornton,et al.  Retroposed new genes out of the X in Drosophila. , 2002, Genome research.

[67]  M. Gerstein,et al.  Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome. , 2002, Genome research.

[68]  C. Ball,et al.  Identification of genes periodically expressed in the human cell cycle and their expression in tumors. , 2002, Molecular biology of the cell.

[69]  L. Duret,et al.  Nature and structure of human genes that generate retropseudogenes. , 2000, Genome research.

[70]  Thierry Heidmann,et al.  Human LINE retrotransposons generate processed pseudogenes , 2000, Nature Genetics.

[71]  A. H. Lipkus A proof of the triangle inequality for the Tanimoto distance , 1999 .

[72]  J. McCarrey,et al.  Human testis-specific PGK gene lacks introns and possesses characteristics of a processed gene , 1987, Nature.