A cautionary note for retrocopy identification: DNA-based duplication of intron-containing genes significantly contributes to the origination of single exon genes

MOTIVATION Retrocopies are important genes in the genomes of almost all higher eukaryotes. However, the annotation of such genes is a non-trivial task. Intronless genes have often been considered to be retroposed copies of intron-containing paralogs. Such categorization relies on the implicit premise that alignable regions of the duplicates should be long enough to cover exon-exon junctions of the intron-containing genes, and thus intron loss events can be inferred. Here, we examined the alternative possibility that intronless genes could be generated by partial DNA-based duplication of intron-containing genes in the fruitfly genome. RESULTS By building pairwise protein-, transcript- and genome-level DNA alignments between intronless genes and their corresponding intron-containing paralogs, we found that alignments do not cover exon-exon junctions in 40% of cases and thus no intron loss could be inferred. For these cases, the candidate parental proteins tend to be partially duplicated, and intergenic sequences or neighboring genes are included in the intronless paralog. Moreover, we observed that it is significantly less likely for these paralogs to show inter-chromosomal duplication and testis-dominant transcription, compared to the remaining 60% of cases with evidence of clear intron loss (retrogenes). These lines of analysis reveal that DNA-based duplication contributes significantly to the 40% of cases of single exon gene duplication. Finally, we performed an analogous survey in the human genome and the result is similar, wherein 34% of the cases do not cover exon-exon junctions. Thus, genome annotation for retrogene identification should discard candidates without clear evidence of intron loss. CONTACT mlong@uchicago.edu; zhangy@uchicago.edu

[1]  D. Haussler,et al.  Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  David Haussler,et al.  The UCSC genome browser database: update 2007 , 2006, Nucleic Acids Res..

[3]  Kevin R. Thornton,et al.  Retroposed new genes out of the X in Drosophila. , 2002, Genome research.

[4]  Mira V. Han,et al.  A Complex Suite of Forces Drives Gene Traffic from Drosophila X Chromosomes , 2009, Genome biology and evolution.

[5]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[6]  Terrence S. Furey,et al.  The UCSC Genome Browser Database: update 2006 , 2005, Nucleic Acids Res..

[7]  M. Long,et al.  The subtelomere of Oryza sativa chromosome 3 short arm as a hot bed of new gene origination in rice. , 2008, Molecular plant.

[8]  Mary Goldman,et al.  The UCSC Genome Browser database: update 2011 , 2010, Nucleic Acids Res..

[9]  R. Myers,et al.  Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data , 2005, Nucleic acids research.

[10]  M. Long,et al.  Extensive Gene Traffic on the Mammalian X Chromosome , 2004, Science.

[11]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[12]  K. H. Wolfe,et al.  Not born equal: increased rate asymmetry in relocated and retrotransposed rodent gene duplicates. , 2006, Molecular biology and evolution.

[13]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[14]  Peer Bork,et al.  PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments , 2006, Nucleic Acids Res..

[15]  Justin O. Borevitz,et al.  Natural Selection Shapes Genome-Wide Patterns of Copy-Number Polymorphism in Drosophila melanogaster , 2008, Science.

[16]  Yun Ding,et al.  On the origin of new genes in Drosophila. , 2008, Genome research.

[17]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[18]  E. Betrán,et al.  Comparative genomics reveals a constant rate of origination and convergent acquisition of functional retrogenes in Drosophila , 2007, Genome Biology.

[19]  M. Long,et al.  Age-dependent chromosomal distribution of male-biased genes in Drosophila. , 2010, Genome research.

[20]  M. Long,et al.  Chromosomal Redistribution of Male-Biased Genes in Mammalian Evolution with Two Bursts of Gene Gain on the X Chromosome , 2010, PLoS biology.

[21]  M. Long,et al.  General gene movement off the X chromosome in the Drosophila genus. , 2009, Genome research.

[22]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[23]  J. Brosius The Contribution of RNAs and Retroposition to Evolutionary Novelties , 2003, Genetica.

[24]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[25]  J. Brosius,et al.  Many G-protein-coupled receptors are encoded by retrogenes. , 1999, Trends in genetics : TIG.

[26]  Manyuan Long,et al.  Duplication-degeneration as a mechanism of gene fission and the origin of new genes in Drosophila species , 2004, Nature Genetics.

[27]  M. Long,et al.  Positive selection for the male functionality of a co-retroposed gene in the hominoids , 2009, BMC Evolutionary Biology.

[28]  Yong Zhang,et al.  NATsDB: Natural Antisense Transcripts DataBase , 2006, Nucleic Acids Res..

[29]  Henrik Kaessmann,et al.  Origins, evolution, and phenotypic impact of new genes. , 2010, Genome research.

[30]  N. Vinckenbosch,et al.  RNA-based gene duplication: mechanistic and evolutionary insights , 2009, Nature Reviews Genetics.

[31]  J. Dow,et al.  Using FlyAtlas to identify better Drosophila melanogaster models of human disease , 2007, Nature Genetics.