Network Analysis of Human Disease Comorbidity Patterns Based on Large-Scale Data Mining

Disease comorbidity is an important aspect of phenotype associations and reflects overlapping pathogenesis between diseases. Existing comorbidity studies usually focused on specific diseases and patient populations. In this study, we systematically mined and analyzed disease comorbidity patterns without restricting disease types and patient populations. We presented a data mining approach and extracted comorbidity patterns from a patient-disease database in the drug adverse event reporting system. The database contains records of 3,354,043 patients. We first demonstrated that the data are not severely biased towards specific patient populations and valuable for comorbidity mining. Then we developed an automatic pipeline to process the data, and applied an association rule mining algorithm to mine comorbidity relationships among multiple diseases. Our approach extracted 8,576 comorbidity patterns for 613 diseases. We constructed a disease comorbidity network from these patterns and demonstrated that the comorbidity clusters reflect genetic associations between diseases. Different from previous studies based on relative risk, which tends to identify comorbidities for rare diseases, our approach extracted many patterns for common diseases. We applied the approach on colorectal cancer, and found interesting relationships between colorectal cancer and metabolic disorders, which may lead to promising pathogenesis discoveries.

[1]  Roded Sharan,et al.  Associating Genes and Protein Complexes with Disease via Network Propagation , 2010, PLoS Comput. Biol..

[2]  S. Friedman,et al.  Obesity, inflammatory signaling, and hepatocellular carcinoma-an enlarging link. , 2010, Cancer cell.

[3]  Martin Oti,et al.  The biological coherence of human phenome databases. , 2009, American journal of human genetics.

[4]  Charles F. Bearden,et al.  A Nondegenerate Code of Deleterious Variants in Mendelian Loci Contributes to Complex Disease Risk , 2013, Cell.

[5]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[6]  Jagdish Chandra Patra,et al.  Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network , 2010, Bioinform..

[7]  V. McKusick Mendelian Inheritance in Man and Its Online Version, OMIM , 2007, The American Journal of Human Genetics.

[8]  C. la Vecchia,et al.  Diabetes mellitus and colorectal cancer risk. , 1997, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[9]  L H Kuller,et al.  Increased blood glucose and insulin, body size, and incident colorectal cancer. , 1999, Journal of the National Cancer Institute.

[10]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Michael J Thun,et al.  Overweight, obesity, and mortality from cancer in a prospectively studied cohort of U.S. adults. , 2003, The New England journal of medicine.

[12]  Albert-László Barabási,et al.  A Dynamic Network Approach for the Study of Human Phenotypes , 2009, PLoS Comput. Biol..

[13]  M. Currie,et al.  Association of angiopoietin-2, C-reactive protein and markers of obesity and insulin resistance with survival outcome in colorectal cancer , 2010, British Journal of Cancer.

[14]  B. Rigas,et al.  Insulin Resistance and Its Contribution to Colon Carcinogenesis , 2003, Experimental biology and medicine.

[15]  D. Sabatini,et al.  mTOR: from growth signal integration to cancer, diabetes and ageing , 2010, Nature Reviews Molecular Cell Biology.

[16]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[17]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  H. Brunner,et al.  From syndrome families to functional genomics , 2004, Nature Reviews Genetics.

[19]  G. Vriend,et al.  A text-mining analysis of the human phenome , 2006, European Journal of Human Genetics.

[20]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[21]  Robin P Boushey,et al.  Colorectal cancer epidemiology: incidence, mortality, survival, and risk factors. , 2009, Clinics in colon and rectal surgery.

[22]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[23]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[24]  R. Tagliaferri,et al.  Discovery of drug mode of action and drug repositioning from transcriptional responses , 2010, Proceedings of the National Academy of Sciences.

[25]  A. Barabasi,et al.  The impact of cellular networks on disease comorbidity , 2009, Molecular systems biology.

[26]  Vipin Kumar,et al.  Co-clustering phenome–genome for phenotype classification and disease gene discovery , 2012, Nucleic acids research.

[27]  M. De Vos,et al.  The evolution of spondyloarthropathies in relation to gut histology. II. Histological aspects. , 1995, The Journal of rheumatology.

[28]  Søren Brunak,et al.  Using Electronic Patient Records to Discover Disease Correlations and Stratify Patient Cohorts , 2011, PLoS Comput. Biol..

[29]  M. Fornage,et al.  A Phenomics-Based Strategy Identifies Loci on APOC1, BRAP, and PLCG1 Associated with Metabolic Syndrome Phenotype Domains , 2011, PLoS genetics.

[30]  Michael Q. Zhang,et al.  Network-based global inference of human disease genes , 2008, Molecular systems biology.

[31]  Pall I. Olason,et al.  A human phenome-interactome network of protein complexes implicated in genetic disorders , 2007, Nature Biotechnology.

[32]  Dirk Elewaut,et al.  Linking Crohn's Disease and Ankylosing Spondylitis: It's All about Genes! , 2010, PLoS genetics.

[33]  Yuchen Jiao,et al.  Association of the Autoimmune Disease Scleroderma with an Immunologic Response to Cancer , 2014, Science.

[34]  A. Rzhetsky,et al.  Probing genetic overlap among complex human phenotypes , 2007, Proceedings of the National Academy of Sciences.

[35]  Carolina Perez-Iratxeta,et al.  Linking genes to diseases: it's all in the data , 2009, Genome Medicine.

[36]  S. Omholt,et al.  Phenomics: the next challenge , 2010, Nature Reviews Genetics.

[37]  Rong Xu,et al.  Mining Patterns of Adverse Events Using Aggregated Clinical Trial Results , 2013, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[38]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.