A Resource of Quantitative Functional Annotation for Homo sapiens Genes

The body of human genomic and proteomic evidence continues to grow at ever-increasing rates, while annotation efforts struggle to keep pace. A surprisingly small fraction of human genes have clear, documented associations with specific functions, and new functions continue to be found for characterized genes. Here we assembled an integrated collection of diverse genomic and proteomic data for 21,341 human genes and make quantitative associations of each to 4333 Gene Ontology terms. We combined guilt-by-profiling and guilt-by-association approaches to exploit features unique to the data types. Performance was evaluated by cross-validation, prospective validation, and by manual evaluation with the biological literature. Functional-linkage networks were also constructed, and their utility was demonstrated by identifying candidate genes related to a glioma FLN using a seed network from genome-wide association studies. Our annotations are presented—alongside existing validated annotations—in a publicly accessible and searchable web interface.

[1]  A. Owen,et al.  A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  T. Joshi,et al.  Genome-scale gene function prediction using multiple sources of high-throughput data in yeast Saccharomyces cerevisiae. , 2004, Omics : a journal of integrative biology.

[3]  Yonghe Li,et al.  LRP5/6 in Wnt signaling and tumorigenesis. , 2005, Future oncology.

[4]  Oliver Hofmann,et al.  A Quick Guide to Large-Scale Genomic Data Mining , 2010, PLoS Comput. Biol..

[5]  Hyunju Lee,et al.  Integrative approaches to the prediction of protein functions based on the feature selection , 2009, BMC Bioinformatics.

[6]  M. Eisen All motifs are NOT created equal: structural properties of transcription factor-DNA interactions and the inference of sequence specificity , 2005, Genome Biology.

[7]  N. Sugimoto,et al.  Influence of cationic molecules on the hairpin to duplex equilibria of self-complementary DNA and RNA oligonucleotides , 2006, Nucleic acids research.

[8]  Madeline A. Crosby,et al.  FlyBase: genomes by the dozen , 2006, Nucleic Acids Res..

[9]  Kimberly Van Auken,et al.  WormBase: new content and better access , 2006, Nucleic Acids Res..

[10]  G. Sumara,et al.  A Probabilistic Functional Network of Yeast Genes , 2004 .

[11]  Weidong Tian,et al.  FuncBase : a resource for quantitative gene function annotation , 2010, Bioinform..

[12]  S. Kasif,et al.  Whole-genome annotation by using evidence integration in functional-linkage networks. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Wei Jiang,et al.  Wnt/beta-catenin signaling pathway as a novel cancer drug target. , 2004, Current cancer drug targets.

[14]  K. Gunsalus,et al.  Network modeling links breast cancer susceptibility and centrosome dysfunction. , 2007, Nature genetics.

[15]  Ting Chen,et al.  An integrated probabilistic model for functional prediction of proteins , 2003, RECOMB '03.

[16]  Weidong Tian,et al.  An en masse phenotype and function prediction system for Mus musculus , 2008, Genome Biology.

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[19]  Simon Kasif,et al.  The art of gene function prediction , 2006, Nature Biotechnology.

[20]  Stanley Letovsky,et al.  Predicting protein function from protein/protein interaction data: a probabilistic approach , 2003, ISMB.

[21]  S. L. Wong,et al.  Combining biological networks to predict genetic interactions. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[22]  A. Fraser,et al.  Predicting genetic modifier loci using functional gene networks. , 2010, Genome research.

[23]  Marc Vidal,et al.  Predictive models of molecular machines involved in Caenorhabditis elegans early embryogenesis , 2005, Nature.

[24]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[25]  Matthew A. Hibbs,et al.  Exploring the human genome with functional maps. , 2009, Genome research.

[26]  Melissa Bondy,et al.  Genome-wide association study identifies five susceptibility loci for glioma , 2009, Nature Genetics.

[27]  Dejan Juric,et al.  Functional network analysis reveals extended gliomagenesis pathway maps and three novel MYC-interacting genes in human gliomas. , 2005, Cancer research.

[28]  Bin Tian,et al.  A large-scale analysis of mRNA polyadenylation of human and mouse genes , 2005, Nucleic acids research.

[29]  Rachael P. Huntley,et al.  The GOA database in 2009—an integrated Gene Ontology Annotation resource , 2008, Nucleic Acids Res..

[30]  Kristin C. Gunsalus,et al.  RNAiDB and PhenoBlast: web tools for genome-wide phenotypic mapping projects , 2004, Nucleic Acids Res..

[31]  Weidong Tian,et al.  Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiae gene function , 2008, Genome Biology.

[32]  Erik L. L. Sonnhammer,et al.  Inparanoid: a comprehensive database of eukaryotic orthologs , 2004, Nucleic Acids Res..

[33]  Damian Szklarczyk,et al.  The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored , 2010, Nucleic Acids Res..

[34]  Cathy H. Wu,et al.  InterPro, progress and status in 2005 , 2004, Nucleic Acids Res..

[35]  E. Marcotte,et al.  Prioritizing candidate disease genes by network-based boosting of genome-wide association data. , 2011, Genome research.

[36]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[37]  Guido Bologna,et al.  A Preliminary Study on the Prediction of Human Protein Functions , 2011, IWINAC.

[38]  Kara Dolinski,et al.  Expanded protein information at SGD: new pages and proteome browser , 2006, Nucleic Acids Res..

[39]  J. Y. Kim,et al.  Cold shock domain proteins and glycine-rich RNA-binding proteins from Arabidopsis thaliana can promote the cold adaptation process in Escherichia coli , 2006, Nucleic acids research.

[40]  Cynthia L. Smith,et al.  The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information , 2004, Genome Biology.

[41]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[42]  Michael I. Jordan,et al.  A critical assessment of Mus musculus gene function prediction using integrated genomic evidence , 2008, Genome Biology.

[43]  Judith A. Blake,et al.  The mouse genome database (MGD): new features facilitating a model system , 2006, Nucleic Acids Res..

[44]  Frederick P. Roth,et al.  Predicting phenotype from patterns of annotation , 2003, ISMB.

[45]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[46]  Frederick P. Roth,et al.  The Synergizer service for translating gene, protein and other biological identifiers , 2008, Bioinform..

[47]  Dennis B. Troup,et al.  NCBI GEO: archive for high-throughput functional genomic data , 2008, Nucleic Acids Res..

[48]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[49]  E. Snitkin,et al.  Genome-wide prioritization of disease genes and identification of disease-disease associations from an integrated human functional linkage network , 2009, Genome Biology.

[50]  E. Marcotte,et al.  It's the machine that matters: Predicting gene function and phenotype from protein networks. , 2010, Journal of proteomics.

[51]  Asa Ben-Hur,et al.  Hierarchical Classification of Gene Ontology Terms Using the Gostruct Method , 2010, J. Bioinform. Comput. Biol..

[52]  R. Myers,et al.  Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data , 2005, Nucleic acids research.

[53]  K. Chrzanowska,et al.  The frequency of NBN molecular variants in pediatric astrocytic tumors , 2009, Journal of Neuro-Oncology.