A network medicine approach to quantify distance between hereditary disease modules on the interactome

We introduce a MeSH-based method that accurately quantifies similarity between heritable diseases at molecular level. This method effectively brings together the existing information about diseases that is scattered across the vast corpus of biomedical literature. We prove that sets of MeSH terms provide a highly descriptive representation of heritable disease and that the structure of MeSH provides a natural way of combining individual MeSH vocabularies. We show that our measure can be used effectively in the prediction of candidate disease genes. We developed a web application to query more than 28.5 million relationships between 7,574 hereditary diseases (96% of OMIM) based on our similarity measure.

[1]  S. Amladi,et al.  Online Mendelian Inheritance in Man 'OMIM'. , 2003, Indian journal of dermatology, venereology and leprology.

[2]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[3]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[4]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[5]  M. Oti,et al.  The modular nature of genetic diseases , 2006, Clinical genetics.

[6]  J. Sadler,et al.  New concepts in von Willebrand disease. , 2005, Annual review of medicine.

[7]  P. Robinson,et al.  The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. , 2008, American journal of human genetics.

[8]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[9]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[10]  O. Griffith,et al.  OMIM (Online Mendelian Inheritance in Man) , 2014 .

[11]  F R Rosendaal,et al.  Factor V Leiden mutation, prothrombin gene mutation, and deficiencies in coagulation inhibitors associated with Budd-Chiari syndrome and portal vein thrombosis: results of a case-control study. , 2000, Blood.

[12]  Gang Fu,et al.  Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data , 2014, Nucleic Acids Res..

[13]  Xinyu Wang,et al.  The role of airway epithelial cells and innate immune cells in chronic respiratory disease , 2014, Nature Reviews Immunology.

[14]  M. DePamphilis,et al.  HUMAN DISEASE , 1957, The Ulster Medical Journal.

[15]  Catia Pesquita,et al.  Metrics for GO based protein semantic similarity: a systematic evaluation , 2008, BMC Bioinformatics.

[16]  G. Vriend,et al.  A text-mining analysis of the human phenome , 2006, European Journal of Human Genetics.

[17]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[18]  D. Hanahan,et al.  Hallmarks of Cancer: The Next Generation , 2011, Cell.

[19]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[20]  A. Barabasi,et al.  Human symptoms–disease network , 2014, Nature Communications.

[21]  G J Roth,et al.  Pseudo-von Willebrand disease: a mutation in the platelet glycoprotein Ib alpha gene associated with a hyperactive surface receptor. , 1993, Blood.

[22]  Juyong Park,et al.  Protein localization as a principal feature of the etiology and comorbidity of genetic diseases , 2011, Molecular systems biology.

[23]  Stylianos E. Antonarakis,et al.  Mendelian disorders deserve more attention , 2006, Nature Reviews Genetics.

[24]  Haixuan Yang,et al.  Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty , 2012, Bioinform..

[25]  M. Vidal,et al.  Edgetic perturbation models of human inherited disorders , 2009, Molecular systems biology.

[26]  JoAnn E Manson,et al.  Type 2 diabetes and subsequent incidence of breast cancer in the Nurses' Health Study. , 2003, Diabetes care.