Pseudofam: the pseudogene families database

Pseudofam (http://pseudofam.pseudogene.org) is a database of pseudogene families based on the protein families from the Pfam database. It provides resources for analyzing the family structure of pseudogenes including query tools, statistical summaries and sequence alignments. The current version of Pseudofam contains more than 125 000 pseudogenes identified from 10 eukaryotic genomes and aligned within nearly 3000 families (approximately one-third of the total families in PfamA). Pseudofam uses a large-scale parallelized homology search algorithm (implemented as an extension of the PseudoPipe pipeline) to identify pseudogenes. Each identified pseudogene is assigned to its parent protein family and subsequently aligned to each other by transferring the parent domain alignments from the Pfam family. Pseudogenes are also given additional annotation based on an ontology, reflecting their mode of creation and subsequent history. In particular, our annotation highlights the association of pseudogene families with genomic features, such as segmental duplications. In addition, pseudogene families are associated with key statistics, which identify outlier families with an unusual degree of pseudogenization. The statistics also show how the number of genes and pseudogenes in families correlates across different species. Overall, they highlight the fact that housekeeping families tend to be enriched with a large number of pseudogenes.

[1]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[2]  Mauno Vihinen,et al.  PseudoGeneQuest – Service for identification of different pseudogene types in the human genome , 2008, BMC Bioinformatics.

[3]  Oliver H. Tam,et al.  Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes , 2008, Nature.

[4]  Mark Gerstein,et al.  PseudoPipe: an automated pseudogene identification pipeline , 2006, Bioinform..

[5]  R. Bontrop,et al.  Reactivation by exon shuffling of a conserved HLA-DR3-like pseudogene segment in a New World primate species. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[7]  R. Charlab,et al.  Systematic identification of pseudogenes through whole genome expression evidence profiling , 2006, Nucleic acids research.

[8]  E. Levanon,et al.  Human housekeeping genes are compact. , 2003, Trends in genetics : TIG.

[9]  Mark Gerstein,et al.  Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes. , 2003, Nucleic acids research.

[10]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[11]  Lars Arvestad,et al.  Genome-Wide Survey for Biologically Functional Pseudogenes , 2006, PLoS Comput. Biol..

[12]  Thomas R. Gruber,et al.  A Translation Approach to Portable Ontologies , 1993 .

[13]  E. Eichler,et al.  Primate segmental duplications: crucibles of evolution, diversity and disease , 2006, Nature Reviews Genetics.

[14]  Steven A Benner,et al.  The evolution of seminal ribonuclease: pseudogene reactivation or multiple gene inactivation events? , 2007, Molecular biology and evolution.

[15]  Bart De Moor,et al.  BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis , 2005, Bioinform..

[16]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[17]  Deyou Zheng,et al.  Asymmetric histone modifications between the original and derived loci of human segmental duplications , 2008, Genome Biology.

[18]  A. Orth,et al.  Large-scale analysis of the human and mouse transcriptomes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Mark Gerstein,et al.  The real life of pseudogenes. , 2006, Scientific American.

[20]  Evelyn Camon,et al.  The EMBL Nucleotide Sequence Database , 2000, Nucleic Acids Res..

[21]  M. Gerstein,et al.  Comprehensive analysis of pseudogenes in prokaryotes: widespread gene decay and failure of putative horizontally transferred genes , 2004, Genome Biology.

[22]  M. Gerstein,et al.  Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. , 2002, Journal of molecular biology.

[23]  L. Duret,et al.  Nature and structure of human genes that generate retropseudogenes. , 2000, Genome research.

[24]  Hugo Y. K. Lam,et al.  Analysis of copy number variants and segmental duplications in the human genome: Evidence for a change in the process of formation in recent evolutionary history. , 2008, Genome research.

[25]  W R Pearson,et al.  Comparison of DNA sequences with protein sequences. , 1997, Genomics.

[26]  Andreas Prlic,et al.  Ensembl 2008 , 2007, Nucleic Acids Res..