Relating Diseases by Integrating Gene Associations and Information Flow through Protein Interaction Network

Identifying similar diseases could potentially provide deeper understanding of their underlying causes, and may even hint at possible treatments. For this purpose, it is necessary to have a similarity measure that reflects the underpinning molecular interactions and biological pathways. We have thus devised a network-based measure that can partially fulfill this goal. Our method assigns weights to all proteins (and consequently their encoding genes) by using information flow from a disease to the protein interaction network and back. Similarity between two diseases is then defined as the cosine of the angle between their corresponding weight vectors. The proposed method also provides a way to suggest disease-pathway associations by using the weights assigned to the genes to perform enrichment analysis for each disease. By calculating pairwise similarities between 2534 diseases, we show that our disease similarity measure is strongly correlated with the probability of finding the diseases in the same disease family and, more importantly, sharing biological pathways. We have also compared our results to those of MimMiner, a text-mining method that assigns pairwise similarity scores to diseases. We find the results of the two methods to be complementary. It is also shown that clustering diseases based on their similarities and performing enrichment analysis for the cluster centers significantly increases the term association rate, suggesting that the cluster centers are better representatives for biological pathways than the diseases themselves. This lends support to the view that our similarity measure is a good indicator of relatedness of biological processes involved in causing the diseases. Although not needed for understanding this paper, the raw results are available for download for further study at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbpmn/DiseaseRelations/.

[1]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[2]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[3]  T. Vicsek,et al.  Weighted network modules , 2007, cond-mat/0703706.

[4]  Xia Li,et al.  The expanded human disease network combining protein–protein interaction information , 2011, European Journal of Human Genetics.

[5]  H. Lester,et al.  Neural Systems Governed by Nicotinic Acetylcholine Receptors: Emerging Hypotheses , 2011, Neuron.

[6]  Miikka Vikkula,et al.  LDL Receptor-Related Protein 5 (LRP5) Affects Bone Accrual and Eye Development , 2001, Cell.

[7]  Jagdish Chandra Patra,et al.  Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network , 2010, Bioinform..

[8]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database: update 2013 , 2012, Nucleic Acids Res..

[9]  G. Vriend,et al.  A text-mining analysis of the human phenome , 2006, European Journal of Human Genetics.

[10]  Gang Feng,et al.  Disease Ontology: a backbone for disease semantic integration , 2011, Nucleic Acids Res..

[11]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[12]  C. Bönnemann,et al.  Myopathy and polyneuropathy in an adolescent with the kyphoscoliotic type of Ehlers–Danlos syndrome , 2009, American journal of medical genetics. Part A.

[13]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[14]  Carol A. Bocchini,et al.  A new face and new challenges for Online Mendelian Inheritance in Man (OMIM®) , 2011, Human mutation.

[15]  Thomas C. Wiegers,et al.  MEDIC: a practical disease vocabulary used at the Comparative Toxicogenomics Database , 2012, Database J. Biol. Databases Curation.

[16]  Joel Dudley,et al.  Network-Based Elucidation of Human Disease Similarities Reveals Common Functional Modules Enriched for Pluripotent Drug Targets , 2010, PLoS Comput. Biol..

[17]  E. Snitkin,et al.  Genome-wide prioritization of disease genes and identification of disease-disease associations from an integrated human functional linkage network , 2009, Genome Biology.

[18]  Yuan-Ping Pang,et al.  ABCC9 mutations identified in human dilated cardiomyopathy disrupt catalytic KATP channel gating , 2004, Nature Genetics.

[19]  Albert-László Barabási,et al.  A Dynamic Network Approach for the Study of Human Phenotypes , 2009, PLoS Comput. Biol..

[20]  R. Scharfmann,et al.  Activating mutations in the ABCC8 gene in neonatal diabetes mellitus. , 2006, The New England journal of medicine.

[21]  M. Warman,et al.  Stickler syndrome without eye involvement is caused by mutations in COL11A2, the gene encoding the alpha2(XI) chain of type XI collagen. , 1998, The Journal of pediatrics.

[22]  J. Mullor,et al.  Pathways and consequences: Hedgehog signaling in human disease. , 2002, Trends in cell biology.

[23]  G. Mortier,et al.  Homozygous mutations in IHH cause acrocapitofemoral dysplasia, an autosomal recessive disorder with cone-shaped epiphyses in hands and hips. , 2003, American journal of human genetics.

[24]  L. Hofbauer,et al.  High bone density due to a mutation in LDL-receptor-related protein 5. , 2002, The New England journal of medicine.

[25]  L. Medina-Kauwe,et al.  4-Aminobutyrate aminotransferase (GABA-transaminase) deficiency , 1999, Journal of Inherited Metabolic Disease.

[26]  J. Orange,et al.  Human nuclear factor kappa B essential modulator mutation can result in immunodeficiency without ectodermal dysplasia. , 2004, The Journal of allergy and clinical immunology.

[27]  Aleksandar Stojmirovic,et al.  Information Flow in Interaction Networks , 2011, J. Comput. Biol..

[28]  Aleksandar Stojmirovic,et al.  Robust and accurate data enrichment statistics via distribution function of sum of weights , 2010, Bioinform..

[29]  Deok-Sun Lee,et al.  Viral Perturbations of Host Networks Reflect Disease Etiology , 2012, PLoS Comput. Biol..

[30]  Aleksandar Stojmirovic,et al.  Information Flow in Interaction Networks II: Channels, Path Lengths, and Potentials , 2012, J. Comput. Biol..

[31]  A. Bauer-Mehren,et al.  Gene-Disease Network Analysis Reveals Functional Modules in Mendelian, Complex and Environmental Diseases , 2011, PloS one.

[32]  Ian M. Donaldson,et al.  iRefIndex: A consolidated protein interaction database with provenance , 2008, BMC Bioinformatics.

[33]  C. Bodemer,et al.  X-linked susceptibility to mycobacteria is caused by mutations in NEMO impairing CD40-dependent IL-12 production , 2006, The Journal of experimental medicine.

[34]  Krin A. Kay,et al.  The implications of human metabolic network topology for disease comorbidity , 2008, Proceedings of the National Academy of Sciences.

[35]  Michael J Parker,et al.  Mutations in LRP5 or FZD4 underlie the common familial exudative vitreoretinopathy locus on chromosome 11q. , 2004, American journal of human genetics.

[36]  M. DePamphilis,et al.  HUMAN DISEASE , 1957, The Ulster Medical Journal.

[37]  Richard P Lifton,et al.  High bone density due to a mutation in LDL-receptor-related protein 5. , 2002, The New England journal of medicine.

[38]  F. Alkuraya,et al.  Identification of ADAMTS18 as a gene mutated in Knobloch syndrome , 2011, Journal of Medical Genetics.

[39]  Howard L. Bleich,et al.  Technical Milestone: Medical Subject Headings Used to Search the Biomedical Literature , 2001, J. Am. Medical Informatics Assoc..

[40]  Aleksandar Stojmirovic,et al.  ppiTrim: constructing non-redundant and up-to-date interactomes , 2011, Database J. Biol. Databases Curation.

[41]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[42]  A. Yüksel,et al.  A variant of Cenani-Lenz type syndactyly. , 2000, Genetic counseling.

[43]  B. Zupan,et al.  Discovering disease-disease associations by fusing systems-level molecular data , 2013, Scientific Reports.