The Birth of New Genes by RNA- and DNA-Mediated Duplication during Mammalian Evolution

Gene duplication has long been recognized as a major force in genome evolution and has recently been recognized as an important source of individual variation. For many years, the origin of functional gene duplicates was assumed to be whole or partial genome duplication events, but recently retrotransposition has also been shown to contribute new functional protein coding genes and siRNA's. In this study, we utilize pseudogenes to recreate more complete gene family histories, and compare the rates of RNA and DNA-mediated duplication and new functional gene formation in five mammalian genomes. We find that RNA-mediated duplication occurs at a much higher and more variable rate than DNA-mediated duplication, and gives rise to many more duplicated sequences over time. We show that, while the chance of RNA-mediated duplicates becoming functional is much lower than that of their DNA-mediated counterparts, the higher rate of retrotransposition leads to nearly equal contributions of new genes by each mechanism. We also find that functional RNA-mediated duplicates are closer to neighboring genes than non-functional RNA-mediated copies, consistent with co-option of regulatory elements at the site of insertion. Overall, new genes derived from DNA and RNA-mediated duplication mechanisms are under similar levels of purifying selective pressure, but have broadly different functions. RNA-mediated duplication gives rise to a diversity of genes but is dominated by the highly expressed genes of RNA metabolic pathways. DNA-mediated duplication can copy regulatory material along with the protein coding region of the gene and often gives rise to classes of genes whose function are dependent on complex regulatory information. This mechanistic difference may in part explain why we find that mammalian protein families tend to evolve by either one mechanism or the other, but rarely by both. Supplementary Material has been provided (see online Supplementary Material at www.liebertonline.com ).

[1]  P. Sneath,et al.  Numerical Taxonomy , 1962, Nature.

[2]  D. Petrov,et al.  Patterns of nucleotide substitution in Drosophila and mammalian genomes. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[3]  A. Meyer,et al.  Are all fishes ancient polyploids? , 2004, Journal of Structural and Functional Genomics.

[4]  N. Vinckenbosch,et al.  Evolutionary fate of retroposed gene copies in the human genome. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Melina E. Hale,et al.  Duplication events and the evolution of segmental identity , 2005, Evolution & development.

[6]  E. Eichler,et al.  Segmental duplications and the evolution of the primate genome , 2002, Nature Reviews Genetics.

[7]  M. Gerstein,et al.  Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability , 2005, Nucleic acids research.

[8]  Liqing Zhang,et al.  A roadmap of tandemly arrayed genes in the genomes of human, mouse, and rat. , 2006, Molecular biology and evolution.

[9]  Dr. Susumu Ohno Evolution by Gene Duplication , 1970, Springer Berlin Heidelberg.

[10]  P. Bork,et al.  Measuring genome evolution. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[11]  E. Eichler,et al.  An Alu transposition model for the origin and expansion of human segmental duplications. , 2003, American journal of human genetics.

[12]  T. Speed,et al.  GOstat: find statistically overrepresented Gene Ontologies within a group of genes. , 2004, Bioinformatics.

[13]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[14]  E. Koonin,et al.  Remarkable Interkingdom Conservation of Intron Positions and Massive, Lineage-Specific Intron Loss and Gain in Eukaryotic Evolution , 2003, Current Biology.

[15]  N. Macmichael,et al.  Notes , 1947, Edinburgh Medical Journal.

[16]  R. Sorek,et al.  Genomic fossils as a snapshot of the human transcriptome , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[17]  D. Labie,et al.  Molecular Evolution , 1991, Nature.

[18]  M. Gerstein,et al.  Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome. , 2002, Genome research.

[19]  Yoshiyuki Sakaki,et al.  Whole-genome screening indicates a possible burst of formation of processed pseudogenes and Alu repeats by particular L1 subfamilies in ancestral primates , 2003, Genome Biology.

[20]  Lars Arvestad,et al.  Genome-Wide Survey for Biologically Functional Pseudogenes , 2006, PLoS Comput. Biol..

[21]  E. Birney,et al.  Comparative genomics: genome-wide analysis in metazoan eukaryotes , 2003, Nature Reviews Genetics.

[22]  S. Mccarroll,et al.  Copy-number variation and association studies of human disease , 2007, Nature Genetics.

[23]  E. Eichler,et al.  A preliminary comparative analysis of primate segmental duplications shows elevated substitution rates and a great-ape expansion of intrachromosomal duplications. , 2006, Genome research.

[24]  Ion I. Mandoiu,et al.  Estimating the Relative Contributions of New Genes from Retrotransposition and Segmental Duplication Events during Mammalian Evolution , 2008, RECOMB-CG.

[25]  E. Rocha Inference and analysis of the relative stability of bacterial chromosomes. , 2006, Molecular biology and evolution.

[26]  Mark Gerstein,et al.  PseudoPipe: an automated pseudogene identification pipeline , 2006, Bioinform..

[27]  Charles E. Chapple,et al.  Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype , 2004, Nature.

[28]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[29]  M. Gerstein,et al.  Comparative analysis of processed pseudogenes in the mouse and human genomes. , 2004, Trends in genetics : TIG.

[30]  Ryan E. Mills,et al.  Which transposable elements are active in the human genome? , 2007, Trends in genetics : TIG.

[31]  J. Sikela,et al.  Lineage-Specific Gene Duplication and Loss in Human and Great Ape Evolution , 2004, PLoS biology.

[32]  A. Hughes,et al.  Two patterns of genome organization in mammals: the chromosomal distribution of duplicate genes in human and mouse. , 2004, Molecular biology and evolution.

[33]  M. Long,et al.  Extensive Gene Traffic on the Mammalian X Chromosome , 2004, Science.

[34]  E. Eichler,et al.  Mouse segmental duplication and copy number variation , 2008, Nature Genetics.

[35]  Damian Smedley,et al.  Ensembl 2005 , 2004, Nucleic Acids Res..

[36]  International Human Genome Sequencing Consortium Finishing the euchromatic sequence of the human genome , 2004 .

[37]  Oliver H. Tam,et al.  Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes , 2008, Nature.

[38]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[39]  Matthew W. Hahn,et al.  Distinguishing among evolutionary models for the maintenance of gene duplicates. , 2009, The Journal of heredity.

[40]  Jeffery P. Demuth,et al.  The Evolution of Mammalian Gene Families , 2006, PloS one.

[41]  K. H. Wolfe,et al.  Eukaryote genome duplication - where's the evidence? , 1998, Current opinion in genetics & development.

[42]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[43]  Kanako O. Koyanagi,et al.  Frequent emergence and functional resurrection of processed pseudogenes in the human and mouse genomes. , 2007, Gene.

[44]  Y. Sakaki,et al.  Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes , 2008, Nature.

[45]  M. Adams,et al.  Recent Segmental Duplications in the Human Genome , 2002, Science.

[46]  Wen-Hsiung Li,et al.  The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study. , 2002, Genome research.

[47]  A. Reymond,et al.  Emergence of Young Human Genes after a Burst of Retroposition in Primates , 2005, PLoS biology.