Comparative Genomics Search for Losses of Long-Established Genes on the Human Lineage

Taking advantage of the complete genome sequences of several mammals, we developed a novel method to detect losses of well-established genes in the human genome through syntenic mapping of gene structures between the human, mouse, and dog genomes. Unlike most previous genomic methods for pseudogene identification, this analysis is able to differentiate losses of well-established genes from pseudogenes formed shortly after segmental duplication or generated via retrotransposition. Therefore, it enables us to find genes that were inactivated long after their birth, which were likely to have evolved nonredundant biological functions before being inactivated. The method was used to look for gene losses along the human lineage during the approximately 75 million years (My) since the common ancestor of primates and rodents (the euarchontoglire crown group). We identified 26 losses of well-established genes in the human genome that were all lost at least 50 My after their birth. Many of them were previously characterized pseudogenes in the human genome, such as GULO and UOX. Our methodology is highly effective at identifying losses of single-copy genes of ancient origin, allowing us to find a few well-known pseudogenes in the human genome missed by previous high-throughput genome-wide studies. In addition to confirming previously known gene losses, we identified 16 previously uncharacterized human pseudogenes that are definitive losses of long-established genes. Among them is ACYL3, an ancient enzyme present in archaea, bacteria, and eukaryotes, but lost approximately 6 to 8 Mya in the ancestor of humans and chimps. Although losses of well-established genes do not equate to adaptive gene losses, they are a useful proxy to use when searching for such genetic changes. This is especially true for adaptive losses that occurred more than 250,000 years ago, since any genetic evidence of the selective sweep indicative of such an event has been erased.

[1]  R. Nielsen,et al.  Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. , 2002, Molecular biology and evolution.

[2]  Mouchiroud Dominique,et al.  HOPPSIGEN: a database of human and mouse processed pseudogenes , 2004, Nucleic Acids Research.

[3]  L. Eckhart,et al.  Genome sequence comparison reveals independent inactivation of the caspase-15 gene in different evolutionary lineages of mammals. , 2006, Molecular biology and evolution.

[4]  H. Glatt,et al.  Structure and localization of the human SULT1B1 gene: neighborhood to SULT1E1 and a SULT1D pseudogene. , 2001, Biochemical and biophysical research communications.

[5]  H. Manev,et al.  Identification of a novel Drosophila gene, beltless, using injectable embryonic and adult RNA interference (RNAi) , 2003, BMC Genomics.

[6]  Doron Lancet,et al.  Natural selection on the olfactory receptor gene family in humans and chimpanzees. , 2003, American journal of human genetics.

[7]  D. Haussler,et al.  Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Jean L. Chang,et al.  Initial sequence of the chimpanzee genome and comparison with the human genome , 2005, Nature.

[9]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[10]  D. Haussler,et al.  Ultraconserved Elements in the Human Genome , 2004, Science.

[11]  James G. R. Gilbert,et al.  The vertebrate genome annotation (Vega) database , 2004, Nucleic Acids Res..

[12]  David L. Wheeler,et al.  GenBank: update , 2004, Nucleic Acids Res..

[13]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[14]  A. Mighell,et al.  Vertebrate pseudogenes , 2000, FEBS letters.

[15]  M. Yamauchi,et al.  Nephrocan, a Novel Member of the Small Leucine-rich Repeat Protein Family, Is an Inhibitor of Transforming Growth Factor-β Signaling* , 2006, Journal of Biological Chemistry.

[17]  Doron Lancet,et al.  Human specific loss of olfactory receptor genes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[18]  André Guillouzo,et al.  The human glutathione transferase alpha locus: genomic organization of the gene cluster and functional characterization of the genetic polymorphism in the hGSTA1 promoter. , 2002, Pharmacogenetics.

[19]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[20]  Adel Khelifi,et al.  HOPPSIGEN: a database of human and mouse processed pseudogenes , 2005, Nucleic Acids Res..

[21]  Lisa M. D'Souza,et al.  Genome sequence of the Brown Norway rat yields insights into mammalian evolution , 2004, Nature.

[22]  Richard Hughey,et al.  Weighting hidden Markov models for maximum discrimination , 1998, Bioinform..

[23]  Molly Przeworski,et al.  The Rise and Fall of the Chemoattractant Receptor GPR33* , 2005, Journal of Biological Chemistry.

[24]  R. Nielsen Molecular signatures of natural selection. , 2005, Annual review of genetics.

[25]  Mark Gerstein,et al.  Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22. , 2002, Genome research.

[26]  Derek E. Wildman,et al.  Adaptive evolution of cytochrome c oxidase subunit VIII in anthropoid primates , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[27]  A. Monaco,et al.  A forkhead-domain gene is mutated in a severe speech and language disorder , 2001, Nature.

[28]  E. Vanin,et al.  Processed pseudogenes: characteristics and evolution. , 1984, Annual review of genetics.

[29]  E. Birney,et al.  Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs , 2002, Nature.

[30]  M. Gerstein,et al.  Comparative analysis of processed pseudogenes in the mouse and human genomes. , 2004, Trends in genetics : TIG.

[31]  Kevin R. Thornton,et al.  Gene duplication and evolution. , 2001, Science.

[32]  Mark Gerstein,et al.  Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation , 2006, Nucleic Acids Res..

[33]  David Perret,et al.  Neuropoietin, a new IL-6-related cytokine signaling through the ciliary neurotrophic factor receptor. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[34]  J. Shrager,et al.  Myosin gene mutation correlates with anatomical changes in the human lineage , 2004, Nature.

[35]  Pardis C Sabeti,et al.  Positive Natural Selection in the Human Lineage , 2006, Science.

[36]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[37]  J J Goedert,et al.  Genetic Restriction of HIV-1 Infection and Progression to AIDS by a Deletion Allele of the CKR5 Structural Gene , 1996, Science.

[38]  Atsushi Yoshiki,et al.  An expressed pseudogene regulates the messenger-RNA stability of its homologous coding gene , 2003, Nature.

[39]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[40]  M V Olson,et al.  When less is more: gene loss as an engine of evolutionary change. , 1999, American journal of human genetics.

[41]  Pardis Sabeti,et al.  Spread of an inactive form of caspase-12 in humans is due to recent positive selection. , 2006, American journal of human genetics.

[42]  Mark Gerstein,et al.  Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome. , 2003, Genome research.

[43]  Jianzhi Zhang,et al.  Gene Losses during Human Origins , 2006, PLoS biology.

[44]  Byungkook Lee,et al.  Identification of nine human-specific frameshift mutations by comparative analysis of the human and the chimpanzee genome sequences , 2005, ISMB.

[45]  M. Gerstein,et al.  A computational approach for identifying pseudogenes in the ENCODE regions , 2006, Genome Biology.

[46]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2004, Nucleic Acids Res..

[47]  T. Hayakawa,et al.  Inactivation of CMP-N-acetylneuraminic acid hydroxylase occurred prior to brain expansion during human evolution , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[48]  M. Lynch,et al.  The evolutionary fate and consequences of duplicate genes. , 2000, Science.

[49]  D. Hartl,et al.  Directional selection and the site-frequency spectrum. , 2001, Genetics.

[50]  B. Ringelhann,et al.  A new look at the protection of hemoglobin AS and AC genotypes against plasmodium falciparum infection: a census tract approach. , 1976, American journal of human genetics.

[51]  Mark Gerstein,et al.  PseudoPipe: an automated pseudogene identification pipeline , 2006, Bioinform..

[52]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[53]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[54]  Alex Bateman,et al.  InterPro: An Integrated Documentation Resource for Protein Families, Domains and Functional Sites , 2002, Briefings Bioinform..

[55]  M. Adams,et al.  Inferring Nonneutral Evolution from Human-Chimp-Mouse Orthologous Gene Trios , 2003, Science.

[56]  Allan R. Jones,et al.  Genome-wide atlas of gene expression in the adult mouse brain , 2007, Nature.

[57]  Michel Bornens,et al.  Centrin4p, a novel mammalian centrin specifically expressed in ciliated cells. , 2003, Molecular biology of the cell.

[58]  M. Nishikimi,et al.  Random nucleotide substitutions in primate nonfunctional gene for L-gulono-gamma-lactone oxidase, the missing enzyme in L-ascorbic acid biosynthesis. , 1999, Biochimica et biophysica acta.

[59]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[60]  K Gowrishankar,et al.  NRADD, a novel membrane protein with a death domain involved in mediating apoptosis in response to ER stress , 2003, Cell Death and Differentiation.

[61]  S. Brenner,et al.  Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humans , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[62]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[63]  Peer Bork,et al.  Identification and Analysis of Genes and Pseudogenes within Duplicated Regions in the Human and Mouse Genomes , 2006, PLoS Comput. Biol..

[64]  C. Tournamille,et al.  Disruption of a GATA motif in the Duffy gene promoter abolishes erythroid gene expression in Duffy–negative individuals , 1995, Nature Genetics.

[65]  James H. Thomas,et al.  Fluoxetine-Resistance Genes in Caenorhabditis elegans Function in the Intestine and May Act in Drug Transport , 2006, Genetics.

[66]  D. Haussler,et al.  Reconstructing large regions of an ancestral mammalian genome in silico. , 2004, Genome research.

[67]  Andrew G. Clark,et al.  Haplotype Diversity and Linkage Disequilibrium at Human G6PD: Recent Origin of Alleles That Confer Malarial Resistance , 2001, Science.

[68]  S. Brunak,et al.  Improved prediction of signal peptides: SignalP 3.0. , 2004, Journal of molecular biology.

[69]  Byungkook Lee,et al.  Human-specific nonsense mutations identified by genome sequence comparisons , 2006, Human Genetics.

[70]  Scott A. Schroeder,et al.  Protection against bronchial asthma by CFTR ΔF508 mutation: A heterozygote advantage in cystic fibrosis , 1995, Nature Medicine.

[71]  Bronwen L. Aken,et al.  Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences , 2007, Nature.

[72]  N. Macmichael,et al.  Notes , 1947, Edinburgh Medical Journal.

[73]  D. Haussler,et al.  An RNA gene expressed during cortical development evolved rapidly in humans , 2006, Nature.

[74]  J. Thomas,et al.  Fluoxetine-resistant mutants in C. elegans define a novel family of transmembrane proteins. , 1999, Molecular cell.

[75]  D. Muzny,et al.  Urate oxidase: primary structure and evolutionary implications. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[76]  B. M. Greenwood,et al.  Natural selection of hemi- and heterozygotes for G6PD deficiency in Africa by resistance to severe malaria , 1995, Nature.

[77]  R. Boucher,et al.  Cystic fibrosis heterozygote resistance to cholera toxin in the cystic fibrosis mouse model. , 1994, Science.

[78]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[79]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[80]  M. Kreitman,et al.  Methods to detect selection in populations with applications to the human. , 2000, Annual review of genomics and human genetics.

[81]  James A. Cuff,et al.  Genome sequence, comparative analysis and haplotype structure of the domestic dog , 2005, Nature.

[82]  R Hen,et al.  Human 5-HT(5) receptors: the 5-HT(5A) receptor is functional but the 5-HT(5B) receptor was lost during mammalian evolution. , 2001, European journal of pharmacology.

[83]  S Minoshima,et al.  Cloning and chromosomal mapping of the human nonfunctional gene for L-gulono-gamma-lactone oxidase, the enzyme for L-ascorbic acid biosynthesis missing in man. , 1994, The Journal of biological chemistry.

[84]  David N. Messina,et al.  Evolutionary and Biomedical Insights from the Rhesus Macaque Genome , 2007, Science.