Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes.

Most common genetic disorders have a complex inheritance and may result from variants in many genes, each contributing only weak effects to the disease. Pinpointing these disease genes within the myriad of susceptibility loci identified in linkage studies is difficult because these loci may contain hundreds of genes. However, in any disorder, most of the disease genes will be involved in only a few different molecular pathways. If we know something about the relationships between the genes, we can assess whether some genes (which may reside in different loci) functionally interact with each other, indicating a joint basis for the disease etiology. There are various repositories of information on pathway relationships. To consolidate this information, we developed a functional human gene network that integrates information on genes and the functional relationships between genes, based on data from the Kyoto Encyclopedia of Genes and Genomes, the Biomolecular Interaction Network Database, Reactome, the Human Protein Reference Database, the Gene Ontology database, predicted protein-protein interactions, human yeast two-hybrid interactions, and microarray co-expressions. We applied this network to interrelate positional candidate genes from different disease loci and then tested 96 heritable disorders for which the Online Mendelian Inheritance in Man database reported at least three disease genes. Artificial susceptibility loci, each containing 100 genes, were constructed around each disease gene, and we used the network to rank these genes on the basis of their functional interactions. By following up the top five genes per artificial locus, we were able to detect at least one known disease gene in 54% of the loci studied, representing a 2.8-fold increase over random selection. This suggests that our method can significantly reduce the cost and effort of pinpointing true disease genes in analyses of disorders for which numerous loci have been reported but for which most of the genes are unknown.

[1]  Stephen J. Garland,et al.  Algorithm 97: Shortest path , 1962, Commun. ACM.

[2]  W. Waller,et al.  On the monotonicity of the performance of Bayesian classifiers (Corresp.) , 1978, IEEE Trans. Inf. Theory.

[3]  P. Tonin,et al.  Genetic mapping of the breast-ovarian cancer syndrome to a small interval on chromosome 17q12-21: exclusion of candidate genes EDH17B2 and RARA. , 1993, Human molecular genetics.

[4]  Steven E. Bayer,et al.  A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. , 1994, Science.

[5]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  A. D’Andrea,et al.  The fanconi anemia pathway requires FAA phosphorylation and FAA/FAC nuclear accumulation. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[7]  R. Cusano,et al.  Exclusion of the Sonic Hedgehog gene as responsible for Currarino syndrome and anorectal malformations with sacral hypodevelopment , 1999, Human Genetics.

[8]  A. Toutain,et al.  Exclusion of RAI2 as the causative gene for Nance-Horan syndrome , 1999, Human Genetics.

[9]  E. Zrenner,et al.  Physical mapping and exclusion of GPR34 as the causative gene for congenital stationary night blindness type 1 , 2000, Human Genetics.

[10]  D. Valle,et al.  Online Mendelian Inheritance In Man (OMIM) , 2000, Human mutation.

[11]  Q. Waisfisz,et al.  The Fanconi anemia protein FANCF forms a nuclear complex with FANCA, FANCC and FANCG. , 2000, Human molecular genetics.

[12]  Hans Joenje,et al.  The emerging genetic and molecular basis of Fanconi anaemia , 2001, Nature Reviews Genetics.

[13]  J. Mullikin,et al.  SSAHA: a fast search method for large DNA databases. , 2001, Genome research.

[14]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[15]  William H. Majoros,et al.  Genomics and natural language processing , 2002, Nature Reviews Genetics.

[16]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[17]  J. Hampe,et al.  Genomic structure, chromosome mapping and expression analysis of the human AVIL gene, and its exclusion as a candidate for locus for inflammatory bowel disease at 12q13-14 (IBD2). , 2002, Gene.

[18]  A. D’Andrea,et al.  The Fanconi anaemia/BRCA pathway , 2003, Nature Reviews Cancer.

[19]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[20]  Frances S. Turner,et al.  POCUS: mining genomic sequence annotation to predict disease genes , 2003, Genome Biology.

[21]  R. Stoughton,et al.  Genetics of gene expression surveyed in maize, mouse and man , 2003, Nature.

[22]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[23]  M. Zatz,et al.  The 10 autosomal recessive limb-girdle muscular dystrophies , 2003, Neuromuscular Disorders.

[24]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[25]  A. Fraser,et al.  A first-draft human protein-interaction map , 2004, Genome Biology.

[26]  C. Molony,et al.  Genetic analysis of genome-wide variation in human gene expression , 2004, Nature.

[27]  B. Rannala,et al.  The Bayesian revolution in genetics , 2004, Nature Reviews Genetics.

[28]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[29]  H. Brunner,et al.  From syndrome families to functional genomics , 2004, Nature Reviews Genetics.

[30]  Dipanwita Roy Chowdhury,et al.  Human protein reference database as a discovery resource for proteomics , 2004, Nucleic Acids Res..

[31]  Michael Egmont-Petersen,et al.  Discovery of Regulatory Connections in Microarray Data , 2004, PKDD.

[32]  Shailesh V. Date,et al.  A Probabilistic Functional Network of Yeast Genes , 2004, Science.

[33]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Todd R Golub,et al.  Gene expression–based high-throughput screening(GE-HTS) and application to leukemia differentiation , 2004, Nature Genetics.

[35]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[36]  M. Gerstein,et al.  Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction. , 2004, Current opinion in microbiology.

[37]  Gilbert Chu,et al.  Portrait of transcriptional responses to ultraviolet and ionizing radiation in human cells. , 2004, Nucleic acids research.

[38]  Mark Gerstein,et al.  Analyzing cellular biochemistry in terms of molecular networks. , 2003, Annual review of biochemistry.

[39]  J. Rothberg,et al.  Gaining confidence in high-throughput protein interaction networks , 2004, Nature Biotechnology.

[40]  R. Tibshirani,et al.  Toxicity from radiation therapy associated with abnormal transcriptional responses to DNA damage. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Simon C. Potter,et al.  An overview of Ensembl. , 2004, Genome research.

[42]  Marc Vidal,et al.  Increasing specificity in high-throughput yeast two-hybrid experiments. , 2004, Methods.

[43]  Harm van Bakel,et al.  TEAM: a tool for the integration of expression, and linkage and association maps , 2004, European Journal of Human Genetics.

[44]  Kimberly Van Auken,et al.  WormBase: a multi-species resource for nematode biology and genomics , 2004, Nucleic Acids Res..

[45]  Homin K. Lee,et al.  Coexpression analysis of human genes across many microarray data sets. , 2004, Genome research.

[46]  E. Birney,et al.  Reactome: a knowledgebase of biological pathways , 2004, Nucleic Acids Research.

[47]  Frank Dudbridge,et al.  Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies. , 2004, American journal of human genetics.

[48]  Adam A. Margolin,et al.  Reverse engineering of regulatory networks in human B cells , 2005, Nature Genetics.

[49]  Ian M. Donaldson,et al.  The Biomolecular Interaction Network Database and related tools 2005 update , 2004, Nucleic Acids Res..

[50]  Dennis B. Troup,et al.  NCBI GEO: mining millions of expression profiles—database and tools , 2004, Nucleic Acids Res..

[51]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2004, Nucleic Acids Res..

[52]  Cathy H. Wu,et al.  InterPro, progress and status in 2005 , 2004, Nucleic Acids Res..

[53]  Gavin Sherlock,et al.  The Stanford Microarray Database accommodates additional microarray platforms and data formats , 2004, Nucleic Acids Res..

[54]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[55]  Bart Baesens,et al.  Confidence intervals for probabilistic network classifiers , 2005, Comput. Stat. Data Anal..

[56]  Arno Siebes,et al.  CONAN: An Integrative System for Biomedical Literature Mining , 2005, EPIA.

[57]  Franz Pernkopf,et al.  Bayesian network classifiers versus selective k-NN classifier , 2005, Pattern Recognit..

[58]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[59]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .