Analysis of the role of retrotransposition in gene evolution in vertebrates

BackgroundThe dynamics of gene evolution are influenced by several genomic processes. One such process is retrotransposition, where an mRNA transcript is reverse-transcribed and reintegrated into the genomic DNA.ResultsWe have surveyed eight vertebrate genomes (human, chimp, dog, cow, rat, mouse, chicken and the puffer-fish T. nigriviridis), for putatively retrotransposed copies of genes. To gain a complete picture of the role of retrotransposition, a robust strategy to identify putative retrogenes (PRs) was derived, in tandem with an adaptation of previous procedures to annotate processed pseudogenes, also called retropseudogenes (RψGs). Mammalian genomes are estimated to contain 400–800 PRs (corresponding to ~3% of genes), with fewer PRs and RψGs in the non-mammalian vertebrates. Focussing on human and mouse, we aged the PRs, analysed for evidence of transcription and selection pressures, and assigned functional categories. The PRs have significantly less transcription evidence mappable to them, are significantly less likely to arise from alternatively-spliced genes, and are statistically overrepresented for ribosomal-protein genes, when compared to the proteome in general. We find evidence for spurts of gene retrotransposition in human and mouse, since the lineage of either species split from the dog lineage, with >200 PRs formed in mouse since its divergence from rat. To examine for selection, we calculated: (i) Ka/Ks values (ratios of non-synonymous and synonymous substitutions in codons), and (ii) the significance of conservation of reading frames in PRs. We found >50 PRs in both human and mouse formed since divergence from dog, that are under pressure to maintain the integrity of their coding sequences. For different subsets of PRs formed at different stages of mammalian evolution, we find some evidence for non-neutral evolution, despite significantly less expression evidence for these sequences.ConclusionThese results indicate that retrotranspositions are a significant source of novel coding sequences in mammalian gene evolution.

[1]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[2]  Christopher J. Lee,et al.  Alternative splicing in the human, mouse and rat genomes is associated with an increased frequency of exon creation and/or loss , 2003, Nature Genetics.

[3]  S. Batzoglou,et al.  Characterization of evolutionary rates and constraints in three Mammalian genomes. , 2004, Genome research.

[4]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[5]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[6]  C. Fizames,et al.  Characterization and repeat analysis of the compact genome of the freshwater pufferfish Tetraodon nigroviridis. , 2000, Genome research.

[7]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[8]  R. Britten,et al.  Majority of divergence between closely related DNA samples is due to indels , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[9]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..

[10]  Kevin R. Thornton,et al.  Retroposed new genes out of the X in Drosophila. , 2002, Genome research.

[11]  M. Gerstein,et al.  Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability , 2005, Nucleic acids research.

[12]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[13]  J. Brosius,et al.  RNAs from all categories generate retrosequences that may be exapted as novel genes or regulatory elements. , 1999, Gene.

[14]  Colin N. Dewey,et al.  Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution , 2004, Nature.

[15]  A. Reymond,et al.  Emergence of Young Human Genes after a Burst of Retroposition in Primates , 2005, PLoS biology.

[16]  M. Long,et al.  Extensive Gene Traffic on the Mammalian X Chromosome , 2004, Science.

[17]  Lisa M. D'Souza,et al.  Genome sequence of the Brown Norway rat yields insights into mammalian evolution , 2004, Nature.

[18]  Paul M. Harrison,et al.  Exhaustive assignment of compositional bias reveals universally prevalent biased regions: analysis of functional associations in human and Drosophila , 2006, BMC Bioinformatics.

[19]  Mark Gerstein,et al.  Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome. , 2003, Genome research.

[20]  Mark Gerstein,et al.  Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22. , 2002, Genome research.

[21]  N. Macmichael,et al.  Notes , 1947, Edinburgh Medical Journal.

[22]  M. Gerstein,et al.  Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome. , 2002, Genome research.

[23]  M. Gerstein,et al.  Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. , 2002, Journal of molecular biology.

[24]  Thierry Heidmann,et al.  Human LINE retrotransposons generate processed pseudogenes , 2000, Nature Genetics.

[25]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[26]  Jan Paces,et al.  Processed pseudogenes of human endogenous retroviruses generated by LINEs: their integration, stability, and distribution. , 2002, Genome research.

[27]  Mark Gerstein,et al.  A "polyORFomic" analysis of prokaryote genomes using disabled-homology filtering reveals conserved but undiscovered short ORFs. , 2003, Journal of molecular biology.

[28]  J. Burch,et al.  Chicken repeat 1 (CR1) elements, which define an ancient family of vertebrate non-LTR retrotransposons, contain two closely spaced open reading frames. , 1997, Gene.

[29]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[30]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[31]  Mark Gerstein,et al.  Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes. , 2003, Nucleic acids research.

[32]  Cecilia Saccone,et al.  Pseudogenes in metazoa: origin and features. , 2004, Briefings in functional genomics & proteomics.

[33]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[34]  D Graur,et al.  Patterns and rates of indel evolution in processed pseudogenes from humans and murids. , 1997, Gene.

[35]  Huanming Yang,et al.  Origin and evolution of new exons in rodents. , 2005, Genome research.