NovelFam3000 – Uncharacterized human protein domains conserved across model organisms

BackgroundDespite significant efforts from the research community, an extensive portion of the proteins encoded by human genes lack an assigned cellular function. Most metazoan proteins are composed of structural and/or functional domains, of which many appear in multiple proteins. Once a domain is characterized in one protein, the presence of a similar sequence in an uncharacterized protein serves as a basis for inference of function. Thus knowledge of a domain's function, or the protein within which it arises, can facilitate the analysis of an entire set of proteins.DescriptionFrom the Pfam domain database, we extracted uncharacterized protein domains represented in proteins from humans, worms, and flies. A data centre was created to facilitate the analysis of the uncharacterized domain-containing proteins. The centre both provides researchers with links to dispersed internet resources containing gene-specific experimental data and enables them to post relevant experimental results or comments. For each human gene in the system, a characterization score is posted, allowing users to track the progress of characterization over time or to identify for study uncharacterized domains in well-characterized genes. As a test of the system, a subset of 39 domains was selected for analysis and the experimental results posted to the NovelFam3000 system. For 25 human protein members of these 39 domain families, detailed sub-cellular localizations were determined. Specific observations are presented based on the analysis of the integrated information provided through the online NovelFam3000 system.ConclusionConsistent experimental results between multiple members of a domain family allow for inferences of the domain's functional role. We unite bioinformatics resources and experimental data in order to accelerate the functional characterization of scarcely annotated domain families.

[1]  J. Hegemann,et al.  Functional analysis in yeast of the Brix protein superfamily involved in the biogenesis of ribosomes. , 2003, FEMS yeast research.

[2]  F. Eisenhaber,et al.  The Brix domain protein family -- a key to the ribosomal biogenesis pathway? , 2001, Trends in biochemical sciences.

[3]  Jun Kawai,et al.  Mammalian class E Vps proteins, SBP1 and mVps2/CHMP2A, interact with and regulate the function of an AAA-ATPase SKD1/Vps4B , 2004, Journal of Cell Science.

[4]  A. Poustka,et al.  Systematic subcellular localization of novel proteins identified by large‐scale cDNA sequencing , 2000, EMBO reports.

[5]  A. Orth,et al.  Large-scale analysis of the human and mouse transcriptomes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[6]  C. Southan Has the yo‐yo stopped? An assessment of human protein‐coding gene number , 2004, Proteomics.

[7]  Kara Dolinski,et al.  Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms , 2004, Nucleic Acids Res..

[8]  R. Guigó,et al.  SGP-1: prediction and validation of homologous genes based on sequence alignments. , 2001, Genome research.

[9]  P. Bork,et al.  Protein domain analysis in the era of complete genomes , 2002, FEBS letters.

[10]  Kimberly Van Auken,et al.  WormBase: a comprehensive data resource for Caenorhabditis biology and genomics , 2004, Nucleic Acids Res..

[11]  Jonathan Lim,et al.  Ulysses - an application for the projection of molecular interactions across species , 2005, Genome Biology.

[12]  Hiroaki Kitano,et al.  The PANTHER database of protein families, subfamilies, functions and pathways , 2004, Nucleic Acids Res..

[13]  Emily Hodges,et al.  Accelerated Discovery of Novel Protein Function in Cultured Human Cells *S , 2005, Molecular & Cellular Proteomics.

[14]  Ian M. Donaldson,et al.  BIND: the Biomolecular Interaction Network Database , 2001, Nucleic Acids Res..

[15]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[16]  Ueli Aebi,et al.  Structure and Assembly of the Nup84p Complex , 2000, The Journal of cell biology.

[17]  Joel D. Martin,et al.  PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine , 2003, BMC Bioinformatics.

[18]  C. Wahlestedt,et al.  A visual intracellular classification strategy for uncharacterized human proteins. , 2000, Experimental cell research.

[19]  Sudhir Kumar,et al.  Comparative Genomics in Eukaryotes , 2005 .

[20]  T. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2006, Nucleic Acids Res..

[21]  Alex Bateman,et al.  The InterPro Database, 2003 brings increased coverage and new features , 2003, Nucleic Acids Res..

[22]  Peter Neuhaus,et al.  "Blogs" and "wikis" are valuable software tools for communication within research groups. , 2005, Artificial organs.

[23]  S. Hollenberg,et al.  CHMP1 is a novel nuclear matrix protein affecting chromatin structure and cell-cycle progression. , 2001, Journal of cell science.

[24]  Jaime Prilusky,et al.  GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support , 1998, Bioinform..

[25]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2004, Nucleic Acids Res..

[26]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[27]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[28]  W. Wasserman,et al.  GeneLynx: a gene-centric portal to the human genome. , 2001, Genome research.

[29]  T. Andrews,et al.  The Ensembl automatic gene annotation system. , 2004, Genome research.

[30]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[31]  M. Hetzer,et al.  The Conserved Nup107-160 Complex Is Critical for Nuclear Pore Complex Assembly , 2003, Cell.

[32]  B. Chait,et al.  Proteomic analysis of the mammalian nuclear pore complex , 2002, The Journal of cell biology.

[33]  Rolf Apweiler,et al.  Annotating the Human Proteome , 2005, Molecular & Cellular Proteomics.

[34]  Jérôme Gouzy,et al.  ProDom: Automated Clustering of Homologous Domains , 2002, Briefings Bioinform..

[35]  J. Rappsilber,et al.  A novel complex of membrane proteins required for formation of a spherical nucleus , 1998, The EMBO journal.

[36]  S. Hollenberg,et al.  CHMP1 functions as a member of a newly defined family of vesicle trafficking proteins. , 2001, Journal of cell science.

[37]  Madeline A. Crosby,et al.  FlyBase: genes and gene models , 2004, Nucleic Acids Res..