Prioritization of potential candidate disease genes by topological similarity of protein-protein interaction network and phenotype data

Identifying candidate disease genes is important to improve medical care. However, this task is challenging in the post-genomic era. Several computational approaches have been proposed to prioritize potential candidate genes relying on protein-protein interaction (PPI) networks. However, the experimental PPI network is usually liable to contain a number of spurious interactions. In this paper, we construct a reliable heterogeneous network by fusing multiple networks, a PPI network reconstructed by topological similarity, a phenotype similarity network and known associations between diseases and genes. We then devise a random walk-based algorithm on the reliable heterogeneous network called RWRHN to prioritize potential candidate genes for inherited diseases. The results of leave-one-out cross-validation experiments show that the RWRHN algorithm has better performance than the RWRH and CIPHER methods in inferring disease genes. Furthermore, RWRHN is used to predict novel causal genes for 16 diseases, including breast cancer, diabetes mellitus type 2, and prostate cancer, as well as to detect disease-related protein complexes. The top predictions are supported by literature evidence.

[1]  Naoto Nakamura,et al.  Association of single-nucleotide polymorphisms in the suppressor of cytokine signaling 2 (SOCS2) gene with type 2 diabetes in the Japanese. , 2006, Genomics.

[2]  Shuhong Zhao,et al.  Candidate Gene Identification Approach: Progress and Challenges , 2007, International journal of biological sciences.

[3]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[4]  Carsten Wiuf,et al.  Germline Mutation in RNASEL Predicts Increased Risk of Head and Neck, Uterine Cervix and Breast Cancer , 2008, PloS one.

[5]  C. Wijmenga,et al.  Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. , 2006, American journal of human genetics.

[6]  Christos Faloutsos,et al.  Random walk with restart: fast solutions and applications , 2008, Knowledge and Information Systems.

[7]  Xiaoli Li,et al.  Inferring Gene-Phenotype Associations via Global Protein Complex Network Propagation , 2011, PloS one.

[8]  Eric K. Neumann,et al.  Identifying disease-causal genes using Semantic Web-based representation of integrated genomic and phenomic knowledge , 2008, J. Biomed. Informatics.

[9]  Paul A. Bates,et al.  Global topological features of cancer proteins in the human interactome , 2006, Bioinform..

[10]  Ilir Agalliu,et al.  Associations of High-Grade Prostate Cancer with BRCA1 and BRCA2 Founder Mutations , 2009, Clinical Cancer Research.

[11]  A Jakubowska,et al.  An inherited NBN mutation is associated with poor prognosis prostate cancer , 2012, British Journal of Cancer.

[12]  Hans-Werner Mewes,et al.  CORUM: the comprehensive resource of mammalian protein complexes , 2007, Nucleic Acids Res..

[13]  Frédéric Guénard,et al.  Germline mutations in the breast cancer susceptibility gene PTEN are rare in high-risk non-BRCA1/2 French Canadian breast cancer families , 2007, Familial Cancer.

[14]  David Beach,et al.  CBX7 controls the growth of normal and tumor-derived prostate cells by repressing the Ink4a/Arf locus , 2005, Oncogene.

[15]  M. Oti,et al.  The modular nature of genetic diseases , 2006, Clinical genetics.

[16]  Sharad Goyal,et al.  Mediator of DNA damage checkpoint protein 1 (MDC1) expression as a prognostic marker for nodal recurrence in early-stage breast cancer patients treated with breast-conserving surgery and radiation therapy , 2011, Breast Cancer Research and Treatment.

[17]  Jagdish Chandra Patra,et al.  Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network , 2010, Bioinform..

[18]  David J. Porteous,et al.  Speeding disease gene discovery by sequence based candidate prioritization , 2005, BMC Bioinformatics.

[19]  Jerónimo Bravo,et al.  The leukemia-associated AML1 (Runx1)–CBFβ complex functions as a DNA-induced molecular clamp , 2001, Nature Structural Biology.

[20]  Luke Hughes-Davies,et al.  EMSY Links the BRCA2 Pathway to Sporadic Breast and Ovarian Cancer , 2003, Cell.

[21]  László Lovász,et al.  Random Walks on Graphs: A Survey , 1993 .

[22]  Hanno Steen,et al.  Development of human protein reference database as an initial platform for approaching systems biology in humans. , 2003, Genome research.

[23]  Rong Wang,et al.  Purification and characterization of the human gamma-secretase complex. , 2004, Biochemistry.

[24]  Jun Wang,et al.  Prioritization of candidate disease genes by topological similarity between disease and protein diffusion profiles , 2013, BMC Bioinformatics.

[25]  M. Resnick,et al.  Altered expression of RET proto-oncogene product in prostatic intraepithelial neoplasia and prostate cancer. , 1998, Journal of the National Cancer Institute.

[26]  Mohammad Saleem,et al.  BMI1, Stem Cell Factor Acting as Novel Serum-biomarker for Caucasian and African-American Prostate Cancer , 2013, PloS one.

[27]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[28]  M J Alvarez-Cubero,et al.  Predictive value in the analysis of RNASEL genotypes in relation to prostate cancer , 2011, Prostate Cancer and Prostatic Diseases.

[29]  D F Schorderet,et al.  cDNA cloning and mapping of a novel islet-brain/JNK-interacting protein. , 2000, Genomics.

[30]  J Alfred Witjes,et al.  Polycomb-group oncogenes EZH2, BMI1, and RING1 are overexpressed in prostate cancer with adverse pathologic and clinical features. , 2007, European urology.

[31]  Michael Q. Zhang,et al.  Network-based global inference of human disease genes , 2008, Molecular systems biology.

[32]  T Niu,et al.  The association between PPP1R3 gene polymorphisms and type 2 diabetes mellitus. , 2001, Chinese medical journal.

[33]  Orland Diez,et al.  Mutation analysis of the SHFM1 gene in breast/ovarian cancer families , 2013, Journal of Cancer Research and Clinical Oncology.

[34]  Ethan M. Lange,et al.  Absence of truncating BRIP1 mutations in chromosome 17q-linked hereditary prostate cancer families , 2009, British Journal of Cancer.

[35]  Jacob de Vlieg,et al.  Literature Mining for the Discovery of Hidden Connections between Drugs, Genes and Diseases , 2010, PLoS Comput. Biol..

[36]  T. Rebbeck,et al.  Association of HPC2/ELAC2 genotypes and prostate cancer. , 2000, American journal of human genetics.

[37]  Damian Smedley,et al.  BioMart – biological queries made easy , 2009, BMC Genomics.

[38]  R. Robertson,et al.  Beta cell nuclear musculoaponeurotic fibrosarcoma oncogene family A (MafA) is deficient in type 2 diabetes , 2012, Diabetologia.

[39]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[40]  G. Vriend,et al.  A text-mining analysis of the human phenome , 2006, European Journal of Human Genetics.

[41]  Jaime Prilusky,et al.  GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support , 1998, Bioinform..

[42]  Jeremy Miller,et al.  Identifying disease-specific genes based on their topological significance in protein networks , 2009, BMC Syst. Biol..

[43]  R. Sharan,et al.  Protein networks in disease. , 2008, Genome research.

[44]  T. Hubbard,et al.  A census of human cancer genes , 2004, Nature Reviews Cancer.

[45]  Ann Moser,et al.  Mutations in the Peroxin Pex26p Responsible for Peroxisome Biogenesis Disorders of Complementation Group 8 Impair Its Stability, Peroxisomal Localization, and Interaction with the Pex1p·Pex6p Complex* , 2006, Journal of Biological Chemistry.

[46]  Jianhua Ruan,et al.  A novel link prediction algorithm for reconstructing protein-protein interaction networks by topological similarity , 2013, Bioinform..