A Computational Method Based on the Integration of Heterogeneous Networks for Predicting Disease-Gene Associations

The identification of disease-causing genes is a fundamental challenge in human health and of great importance in improving medical care, and provides a better understanding of gene functions. Recent computational approaches based on the interactions among human proteins and disease similarities have shown their power in tackling the issue. In this paper, a novel systematic and global method that integrates two heterogeneous networks for prioritizing candidate disease-causing genes is provided, based on the observation that genes causing the same or similar diseases tend to lie close to one another in a network of protein-protein interactions. In this method, the association score function between a query disease and a candidate gene is defined as the weighted sum of all the association scores between similar diseases and neighbouring genes. Moreover, the topological correlation of these two heterogeneous networks can be incorporated into the definition of the score function, and finally an iterative algorithm is designed for this issue. This method was tested with 10-fold cross-validation on all 1,126 diseases that have at least a known causal gene, and it ranked the correct gene as one of the top ten in 622 of all the 1,428 cases, significantly outperforming a state-of-the-art method called PRINCE. The results brought about by this method were applied to study three multi-factorial disorders: breast cancer, Alzheimer disease and diabetes mellitus type 2, and some suggestions of novel causal genes and candidate disease-causing subnetworks were provided for further investigation.

[1]  M. DePamphilis,et al.  HUMAN DISEASE , 1957, The Ulster Medical Journal.

[2]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[3]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[4]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[5]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[6]  L. Feuk,et al.  Genetic variation in a haplotype block spanning IDE influences Alzheimer disease , 2003, Human mutation.

[7]  L. Farrer,et al.  Identification of multiple loci for Alzheimer disease in a consanguineous Israeli-Arab community. , 2003, Human molecular genetics.

[8]  Dipanwita Roy Chowdhury,et al.  Human protein reference database as a discovery resource for proteomics , 2004, Nucleic Acids Res..

[9]  L. Biesecker,et al.  Mapping phenotypes to language: a proposal to organize and standardize the clinical descriptions of malformations , 2005, Clinical genetics.

[10]  Fan Chung Graham,et al.  Local Graph Partitioning using PageRank Vectors , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[11]  G. Vriend,et al.  A text-mining analysis of the human phenome , 2006, European Journal of Human Genetics.

[12]  B. Snel,et al.  Predicting disease genes using protein–protein interactions , 2006, Journal of Medical Genetics.

[13]  K. N. Chandrika,et al.  Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets , 2006, Nature Genetics.

[14]  P. Devilee,et al.  Genetic susceptibility for breast cancer: how many more genes to be found? , 2007, Critical reviews in oncology/hematology.

[15]  M. Oti,et al.  The modular nature of genetic diseases , 2006, Clinical genetics.

[16]  Karen L. Mohlke,et al.  Data and text mining A computational system to select candidate genes for complex human traits , 2007 .

[17]  A. Rzhetsky,et al.  Probing genetic overlap among complex human phenotypes , 2007, Proceedings of the National Academy of Sciences.

[18]  Pall I. Olason,et al.  A human phenome-interactome network of protein complexes implicated in genetic disorders , 2007, Nature Biotechnology.

[19]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[20]  Hedi Peterson,et al.  g:Profiler—a web-based toolset for functional profiling of gene lists from large-scale experiments , 2007, Nucleic Acids Res..

[21]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[22]  Michael Q. Zhang,et al.  Network-based global inference of human disease genes , 2008, Molecular systems biology.

[23]  Qifang Liu,et al.  Align human interactome with phenome to identify causative genes and networks underlying disease families , 2009, Bioinform..

[24]  V. Pankratz,et al.  Gene expression levels as endophenotypes in genome-wide association studies of Alzheimer disease , 2010, Neurology.

[25]  Roded Sharan,et al.  Associating Genes and Protein Complexes with Disease via Network Propagation , 2010, PLoS Comput. Biol..

[26]  R. Mayeux,et al.  Epidemiology of Alzheimer disease , 2011, Nature Reviews Neurology.

[27]  Hans Lehrach,et al.  The role of clusterin, complement receptor 1, and phosphatidylinositol binding clathrin assembly protein in Alzheimer disease risk and cerebrospinal fluid biomarker levels. , 2011, Archives of general psychiatry.

[28]  Genben Chen,et al.  Type 1 receptor parathyroid hormone (PTH1R) influences breast cancer cell proliferation and apoptosis induced by high levels of glucose , 2012, Medical Oncology.

[29]  F. Tsai,et al.  Genome-wide association study of diabetic retinopathy in a Taiwanese population. , 2011, Ophthalmology.