Recent de novo origin of human protein-coding genes.

The origin of new genes is extremely important to evolutionary innovation. Most new genes arise from existing genes through duplication or recombination. The origin of new genes from noncoding DNA is extremely rare, and very few eukaryotic examples are known. We present evidence for the de novo origin of at least three human protein-coding genes since the divergence with chimp. Each of these genes has no protein-coding homologs in any other genome, but is supported by evidence from expression and, importantly, proteomics data. The absence of these genes in chimp and macaque cannot be explained by sequencing gaps or annotation error. High-quality sequence data indicate that these loci are noncoding DNA in other primates. Furthermore, chimp, gorilla, gibbon, and macaque share the same disabling sequence difference, supporting the inference that the ancestral sequence was noncoding over the alternative possibility of parallel gene inactivation in multiple primate lineages. The genes are not well characterized, but interestingly, one of them was first identified as an up-regulated gene in chronic lymphocytic leukemia. This is the first evidence for entirely novel human-specific protein-coding genes originating from ancestrally noncoding sequences. We estimate that 0.075% of human genes may have originated through this mechanism leading to a total expectation of 18 such cases in a genome of 24,000 protein-coding genes.

[1]  T. Griffin,et al.  Gel‐free mass spectrometry‐based high throughput proteomics: Tools for studying biological response of proteins and proteomes , 2006, Proteomics.

[2]  Andreas Prlic,et al.  Ensembl 2007 , 2006, Nucleic Acids Res..

[3]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[4]  N. Vinckenbosch,et al.  RNA-based gene duplication: mechanistic and evolutionary insights , 2009, Nature Reviews Genetics.

[5]  Chiao-Feng Lin,et al.  Birth and death of gene overlaps in vertebrates , 2007, BMC Evolutionary Biology.

[6]  G. McFadden,et al.  The miniaturized nuclear genome of eukaryotic endosymbiont contains genes that overlap, genes that are cotranscribed, and the smallest known spliceosomal introns. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[7]  R. Gibbs,et al.  PipMaker--a web server for aligning two genomic DNA sequences. , 2000, Genome research.

[8]  Geoffrey J. Barton,et al.  The Jalview Java alignment editor , 2004, Bioinform..

[9]  L. Armengol,et al.  Origin of primate orphan genes: a comparative genomics approach. , 2008, Molecular biology and evolution.

[10]  H. Leffers,et al.  Identification of a gene on chromosome 12q22 uniquely overexpressed in chronic lymphocytic leukemia. , 2006, Blood.

[11]  Henrik Kaessmann,et al.  Birth and Rapid Subcellular Adaptation of a Hominoid-Specific CDC14 Protein , 2008, PLoS biology.

[12]  S. L. Wong,et al.  Extensive Gene Traffic on the Mammalian X Chromosome , 2022 .

[13]  Andrew D Kern,et al.  Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[14]  R. Lewontin,et al.  Origin of new genes , 2000 .

[15]  George Newport,et al.  The diploid genome sequence of Candida albicans. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[17]  O. Mühlemann,et al.  The meaning of nonsense. , 2008, Trends in cell biology.

[18]  Huifeng Jiang,et al.  De Novo Origination of a New Protein-Coding Gene in Saccharomyces cerevisiae , 2008, Genetics.

[19]  M. Long,et al.  Extensive Gene Traffic on the Mammalian X Chromosome , 2004, Science.

[20]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[21]  James A. Cuff,et al.  Distinguishing protein-coding and noncoding genes in the human genome , 2007, Proceedings of the National Academy of Sciences.

[22]  N. Vinckenbosch,et al.  Chromosomal Gene Movements Reflect the Recent Origin and Biology of Therian Sex Chromosomes , 2008, PLoS biology.

[23]  Fabien Burki,et al.  Birth and adaptive evolution of a hominoid gene that supports high neurotransmitter flux , 2004, Nature Genetics.

[24]  Andrew D Kern,et al.  Evidence for de Novo Evolution of Testis-Expressed Genes in the Drosophila yakuba/Drosophila erecta Clade , 2007, Genetics.

[25]  Douglas G Scofield,et al.  Intron size, abundance, and distribution within untranslated regions of genes. , 2006, Molecular biology and evolution.

[26]  Carlos D Bustamante,et al.  Localizing Recent Adaptive Evolution in the Human Genome , 2007, PLoS genetics.

[27]  Nichole L. King,et al.  Human Plasma PeptideAtlas , 2005, Proteomics.

[28]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[29]  Yun Ding,et al.  On the origin of new genes in Drosophila. , 2008, Genome research.

[30]  Kevin R. Thornton,et al.  The origin of new genes: glimpses from the young and old , 2003, Nature Reviews Genetics.

[31]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[32]  J. Pritchard,et al.  A Map of Recent Positive Selection in the Human Genome , 2006, PLoS biology.

[33]  Dawei Li,et al.  The diploid genome sequence of an Asian individual , 2008, Nature.

[34]  Lennart Martens,et al.  PRIDE: The proteomics identifications database , 2005, Proteomics.