Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome.

Processed pseudogenes were created by reverse-transcription of mRNAs; they provide snapshots of ancient genes existing millions of years ago in the genome. To find them in the present-day human, we developed a pipeline using features such as intron-absence, frame-disruption, polyadenylation, and truncation. This has enabled us to identify in recent genome drafts approximately 8000 processed pseudogenes (distributed from http://pseudogene.org). Overall, processed pseudogenes are very similar to their closest corresponding human gene, being 94% complete in coding regions, with sequence similarity of 75% for amino acids and 86% for nucleotides. Their chromosomal distribution appears random and dispersed, with the numbers on chromosomes proportional to length, suggesting sustained "bombardment" over evolution. However, it does vary with GC-content: Processed pseudogenes occur mostly in intermediate GC-content regions. This is similar to Alus but contrasts with functional genes and L1-repeats. Pseudogenes, moreover, have age profiles similar to Alus. The number of pseudogenes associated with a given gene follows a power-law relationship, with a few genes giving rise to many pseudogenes and most giving rise to few. The prevalence of processed pseudogenes agrees well with germ-line gene expression. Highly expressed ribosomal proteins account for approximately 20% of the total. Other notables include cyclophilin-A, keratin, GAPDH, and cytochrome c.

[1]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[2]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[3]  W. K. Alfred Yung,et al.  Identification of a candidate tumour suppressor gene, MMAC1, at chromosome 10q23.3 that is mutated in multiple advanced cancers , 1997, Nature Genetics.

[4]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[5]  S. Spivack,et al.  mRNA-specific reverse transcription-polymerase chain reaction from human tissue extracts. , 2002, Analytical biochemistry.

[6]  M. Nei,et al.  Molecular Evolution and Phylogenetics , 2000 .

[7]  D. Bensasson,et al.  Frequent assimilation of mitochondrial DNA by grasshopper nuclear genomes. , 2000, Molecular biology and evolution.

[8]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[9]  Philippe Dessen,et al.  Structure and chromosomal distribution of human mitochondrial pseudogenes. , 2002, Genomics.

[10]  G. Bernardi,et al.  Similar integration but different stability of Alus and LINEs in the human genome. , 2001, Gene.

[11]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[12]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[13]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[14]  B. Barrell,et al.  Massive gene decay in the leprosy bacillus , 2001, Nature.

[15]  A. Mighell,et al.  Vertebrate pseudogenes , 2000, FEBS letters.

[16]  C. MacArthur,et al.  Cloning, expression and nuclear localization of human NPM3, a member of the nucleophosmin/nucleoplasmin family of nuclear chaperones , 2001, BMC Genomics.

[17]  A. Weiner,et al.  Do all SINEs lead to LINEs? , 2000, Nature Genetics.

[18]  M. Gerstein,et al.  A question of size: the eukaryotic proteome and the problems in defining it. , 2002, Nucleic acids research.

[19]  J. Davies,et al.  Molecular Biology of the Cell , 1983, Bristol Medico-Chirurgical Journal.

[20]  S. Humphreys,et al.  Sensitivity of immunohistochemistry and polymerase chain reaction in detecting prostate cancer cells in bone marrow. , 1994, The journal of histochemistry and cytochemistry : official journal of the Histochemistry Society.

[21]  C. Burge,et al.  Computational inference of homologous gene structures in the human genome. , 2001, Genome research.

[22]  M. Simmonds,et al.  Genome sequence of Yersinia pestis, the causative agent of plague , 2001, Nature.

[23]  S. Schuffenhauer,et al.  Cyclophilin A, the major intracellular receptor for the immunosuppressant cyclosporin A, maps to chromosome 7p11.2-p13: four pseudogenes map to chromosomes 3, 10, 14, and 18. , 1995, Genomics.

[24]  M. Griswold,et al.  Expression of prohibitin in rat seminiferous epithelium. , 1993, Biology of reproduction.

[25]  E. Birney,et al.  Mining the draft human genome , 2001, Nature.

[26]  B. Bramlage,et al.  Differential expression of the murine histone genes H3.3A and H3.3B. , 1997, Differentiation; research in biological diversity.

[27]  S. Gitelman,et al.  Abundant adrenal-specific transcription of the human P450c21A "pseudogene". , 1993, The Journal of biological chemistry.

[28]  D. Petrov,et al.  Genomic gigantism: DNA loss is slow in mountain grasshoppers. , 2001, Molecular biology and evolution.

[29]  N. Kenmochi,et al.  A complete map of the human ribosomal protein genes: assignment of 80 genes to the cytogenetic map and implications for human disorders. , 2001, Genomics.

[30]  M. Gerstein,et al.  The dominance of the population by a selected few: power-law behaviour applies to a wide variety of genomic properties , 2002, Genome Biology.

[31]  J. Jurka,et al.  Sequence patterns indicate an enzymatic involvement in integration of mammalian retroposons. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[32]  C. Dani,et al.  Post-transcriptional regulation of glyceraldehyde-3-phosphate-dehydrogenase gene expression in rat tissues. , 1984, Nucleic acids research.

[33]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[34]  J. Sellers,et al.  Human myosin XVBP is a transcribed pseudogene , 2004, Journal of Muscle Research & Cell Motility.

[35]  John C. Wootton,et al.  Statistics of Local Complexity in Amino Acid Sequences and Sequence Databases , 1993, Comput. Chem..

[36]  G Bernardi,et al.  Misunderstandings about isochores. Part 1. , 2001, Gene.

[37]  W. Li,et al.  Evidence for higher rates of nucleotide substitution in rodents than in man. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[39]  M. Long,et al.  Intron-exon structures of eukaryotic model organisms. , 1999, Nucleic acids research.

[40]  G Bernardi,et al.  Isochores and the evolutionary genomics of vertebrates. , 2000, Gene.

[41]  Dan Graur,et al.  Deletions in processed pseudogenes accumulate faster in rodents than in humans , 1989, Journal of Molecular Evolution.

[42]  M. Nei,et al.  Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. , 1986, Molecular biology and evolution.

[43]  M. Daly,et al.  A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms , 2001, Nature.

[44]  J. Weissenbach,et al.  Mechanisms of Evolution in Rickettsia conorii and R. prowazekii , 2001, Science.

[45]  B. Barrell,et al.  Massive gene decay in the leprosy , 2001 .

[46]  I. Wool,et al.  Structure and evolution of mammalian ribosomal proteins. , 1995, Biochemistry and cell biology = Biochimie et biologie cellulaire.

[47]  Mark Gerstein,et al.  A small reservoir of disabled ORFs in the yeast genome and its implications for the dynamics of proteome evolution. , 2002, Journal of molecular biology.

[48]  N. Kenmochi,et al.  The human ribosomal protein genes: sequencing and comparative analysis of 73 genes. , 2002, Genome research.

[49]  Gustavo Glusman,et al.  The complete human olfactory subgenome. , 2001, Genome research.

[50]  W R Pearson,et al.  Comparison of DNA sequences with protein sequences. , 1997, Genomics.

[51]  Jef D Boeke,et al.  Human L1 Retrotransposon Encodes a Conserved Endonuclease Required for Retrotransposition , 1996, Cell.

[52]  T. Sicheritz-Pontén,et al.  The genome sequence of Rickettsia prowazekii and the origin of mitochondria , 1998, Nature.

[53]  K. Nishikawa,et al.  A systematic investigation identifies a significant number of probable pseudogenes in the Escherichia coli genome. , 2002, Gene.

[54]  M. Gerstein,et al.  The human genome has 49 cytochrome c pseudogenes, including a relic of a primordial gene that still functions in mouse. , 2003, Gene.

[55]  J. Bolen,et al.  Transcriptional analysis of the PTEN/MMAC1 pseudogene, ΨPTEN , 1999, Oncogene.

[56]  H. Robertson,et al.  The large srh family of chemoreceptor genes in Caenorhabditis nematodes reveals processes of genome evolution involving large duplications and deletions and intron gains and losses. , 2000, Genome research.

[57]  Evidence suggesting that a fifth of annotated Caenorhabditis elegans genes may be pseudogenes. , 2002, Genome research.

[58]  Y. Cheng,et al.  Identification of antisense RNA transcripts from a human DNA topoisomerase I pseudogene. , 1992, Cancer research.

[59]  Thierry Heidmann,et al.  Human LINE retrotransposons generate processed pseudogenes , 2000, Nature Genetics.

[60]  Alasdair J Edgar,et al.  The human L-threonine 3-dehydrogenase gene is an expressed pseudogene , 2002, BMC Genetics.

[61]  E. Vanin,et al.  Processed pseudogenes: characteristics and evolution. , 1985, Annual review of genetics.

[62]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[63]  L. Duret,et al.  Nature and structure of human genes that generate retropseudogenes. , 2000, Genome research.

[64]  N Goodman,et al.  A map of 75 human ribosomal protein genes. , 1998, Genome research.

[65]  M. Gerstein,et al.  Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. , 2002, Journal of molecular biology.

[66]  M. Gerstein,et al.  Identification and characterization of over 100 mitochondrial ribosomal protein pseudogenes in the human genome. , 2003, Genomics.

[67]  D. Petrov,et al.  High intrinsic rate of DNA loss in Drosophila , 1996, Nature.

[68]  E. Hovig,et al.  Identification of a novel cytokeratin 19 pseudogene that may interfere with reverse transcriptase‐polymerase chain reaction assays used to detect micrometastatic tumor cells , 1999, International journal of cancer.

[69]  G Bernardi,et al.  An approach to the organization of eukaryotic genomes at a macromolecular level. , 1976, Journal of molecular biology.

[70]  G Bernardi,et al.  The distribution of genes in the human genome. , 1991, Gene.

[71]  M. Woischnik,et al.  Pattern of organization of human mitochondrial pseudogenes in the nuclear genome. , 2002, Genome research.

[72]  Mark Gerstein,et al.  Identification of pseudogenes in the Drosophila melanogaster genome. , 2003, Nucleic acids research.

[73]  G. Webb,et al.  Antisense transcription of a murine FGFR-3 psuedogene during fetal developement. , 1997, Gene.

[74]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[75]  S Karlin,et al.  Genome-scale compositional comparisons in eukaryotes. , 2001, Genome research.

[76]  R. Scarpulla,et al.  The human somatic cytochrome c gene: two classes of processed pseudogenes demarcate a period of rapid molecular evolution. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[77]  L. Hurst The Ka/Ks ratio: diagnosing the form of sequence evolution. , 2002, Trends in genetics : TIG.

[78]  M. Gerstein,et al.  Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome. , 2002, Genome research.

[79]  Mark Gerstein,et al.  Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22. , 2002, Genome research.

[80]  D. A. O’Brien,et al.  Human glyceraldehyde 3-phosphate dehydrogenase-2 gene is expressed specifically in spermatogenic cells. , 2000, Journal of andrology.

[81]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[82]  N. Perna,et al.  Mitochondrial DNA: Molecular fossils in the nucleus , 1996, Current Biology.

[83]  K. Müller,et al.  Low specificity of cytokeratin 19 reverse transcriptase-polymerase chain reaction analyses for detection of hematogenous lung cancer dissemination. , 1995, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[84]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[85]  F. Bonhomme,et al.  Concerted evolution in the GAPDH family of retrotransposed pseudogenes , 1993, Mammalian Genome.

[86]  R. Chapin,et al.  Cyclophilin A is present in rat germ cells and is associated with spermatocyte apoptosis. Reproductive Toxicology Group. , 1997, Biology of reproduction.

[87]  S. Povey,et al.  Members of the human glyceraldehyde-3-phosphate dehydrogenase-related gene family map to dispersed chromosomal locations. , 1989, Genomics.

[88]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[89]  M. Long,et al.  Natural selection and the origin of jingwei, a chimeric processed functional gene in Drosophila. , 1993, Science.

[90]  Z. Gu,et al.  Evolutionary analyses of the human genome , 2001, Nature.

[91]  Z. Yang,et al.  Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. , 2000, Molecular biology and evolution.

[92]  Douglas S. Portman,et al.  Cell type-specific expression of hnRNP proteins. , 1995, Experimental cell research.

[93]  T. Heidmann,et al.  mRNA retroposition in human cells: processed pseudogene formation. , 1995, The EMBO journal.

[94]  M. Gerstein,et al.  Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome. , 2001, Nucleic acids research.

[95]  D. Petrov,et al.  Pseudogene evolution and natural selection for a compact genome. , 2000, The Journal of heredity.