CLASH: Complementary Linkage with Anchoring and Scoring for Heterogeneous biomolecular and clinical data

BackgroundThe study on disease-disease association has been increasingly viewed and analyzed as a network, in which the connections between diseases are configured using the source information on interactome maps of biomolecules such as genes, proteins, metabolites, etc. Although abundance in source information leads to tighter connections between diseases in the network, for a certain group of diseases, such as metabolic diseases, the connections do not occur much due to insufficient source information; a large proportion of their associated genes are still unknown. One way to circumvent the difficulties in the lack of source information is to integrate available external information by using one of up-to-date integration or fusion methods. However, if one wants a disease network placing huge emphasis on the original source of data but still utilizing external sources only to complement it, integration may not be pertinent. Interpretation on the integrated network would be ambiguous: meanings conferred on edges would be vague due to fused information.MethodsIn this study, we propose a network based algorithm that complements the original network by utilizing external information while preserving the network’s originality. The proposed algorithm links the disconnected node to the disease network by using complementary information from external data source through four steps: anchoring, connecting, scoring, and stopping.ResultsWhen applied to the network of metabolic diseases that is sourced from protein-protein interaction data, the proposed algorithm recovered connections by 97%, and improved the AUC performance up to 0.71 (lifted from 0.55) by using the external information outsourced from text mining results on PubMed comorbidity literatures. Experimental results also show that the proposed algorithm is robust to noisy external information.ConclusionThis research has novelty in which the proposed algorithm preserves the network’s originality, but at the same time, complements it by utilizing external information. Furthermore it can be utilized for original association recovery and novel association discovery for disease network.

[1]  G. Zuccotti,et al.  Metabolic complications associated with antiretroviral therapy in HIV-infected and HIV-exposed uninfected paediatric patients , 2010, Expert Opinion on Drug Safety.

[2]  Keinosuke Fukunaga,et al.  Leave-One-Out Procedures for Nonparametric Error Estimates , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  D. V. van Thiel,et al.  A syndrome of immunoglobulin A deficiency, diabetes mellitus, malabsorption, a common HLA haplotype. Immunologic and genetic studies of forty-three family members. , 1977, Annals of internal medicine.

[4]  Hyunjung Shin,et al.  Research and applications: Breast cancer survivability prediction using labeled, unlabeled, and pseudo-labeled patient data , 2013, J. Am. Medical Informatics Assoc..

[5]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[6]  Kyung-Ah Sohn,et al.  Knowledge boosting: a graph-based integration approach with multi-omics data and genomic knowledge for cancer clinical outcome prediction , 2014, J. Am. Medical Informatics Assoc..

[7]  Pankaj Agarwal,et al.  A Pathway-Based View of Human Diseases and Disease Relationships , 2009, PloS one.

[8]  R. Kyle,et al.  Primary systemic amyloidosis: a cause of malabsorption syndrome. , 2001, The American journal of medicine.

[9]  B. Zupan,et al.  Discovering disease-disease associations by fusing systems-level molecular data , 2013, Scientific Reports.

[10]  A. Barabasi,et al.  Drug—target network , 2007, Nature Biotechnology.

[11]  Andreas Martin Lisewski,et al.  Graph sharpening plus graph integration: a synergy that improves protein functional classification , 2007, Bioinform..

[12]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[13]  Bernhard Schölkopf,et al.  Fast protein classification with multiple networks , 2005, ECCB/JBI.

[14]  Albert-László Barabási,et al.  A Dynamic Network Approach for the Study of Human Phenotypes , 2009, PLoS Comput. Biol..

[15]  Nataša Pržulj,et al.  The integrated disease network. , 2014, Integrative biology : quantitative biosciences from nano to macro.

[16]  M. DePamphilis,et al.  HUMAN DISEASE , 1957, The Ulster Medical Journal.

[17]  H. Bodur,et al.  Osteomalacia in Crohn’s disease , 2014, Archives of Osteoporosis.

[18]  Piero Fariselli,et al.  Blurring contact maps of thousands of proteins: what we can learn by reconstructing 3D structure , 2011, BioData Mining.

[19]  G. D. Drummey,et al.  The d-xylose absorption test in malabsorption syndromes. , 1957, The New England journal of medicine.

[20]  R. Aggarwal,et al.  Spectrum of malabsorption syndrome among adults & factors differentiating celiac disease & tropical malabsorption , 2012, The Indian journal of medical research.

[21]  A. Barabasi,et al.  Human symptoms–disease network , 2014, Nature Communications.

[22]  Ju Han Kim,et al.  Synergistic effect of different levels of genomic data for cancer clinical outcome prediction , 2012, J. Biomed. Informatics.

[23]  Ju Han Kim,et al.  Incorporating inter-relationships between different levels of genomic data into cancer clinical outcome prediction. , 2014, Methods.

[24]  J. Dilawari,et al.  Effect of dietary fiber on complications of gastric surgery: prevention of postprandial hypoglycemia by pectin. , 1977, Gastroenterology.

[25]  Natasa Przulj,et al.  Predicting disease associations via biological network analysis , 2014, BMC Bioinformatics.

[26]  Haakan Strand,et al.  Effect of a dedicated osteoporosis health professional on screening and treatment in outpatients presenting with acute low trauma non-hip fracture: a systematic review , 2014, Archives of Osteoporosis.

[27]  S. Lee,et al.  Metabolic network modeling and simulation for drug targeting and discovery , 2012, Biotechnology journal.

[28]  Pietro Lio',et al.  Comorbidity: a multidimensional approach. , 2013, Trends in molecular medicine.

[29]  M. Piatti,et al.  Is hyperhomocysteinemia relevant in patients with celiac disease? , 2011, World journal of gastroenterology.

[30]  Shigehiko Kanaya,et al.  Systems Biology in the Context of Big Data and Networks , 2014, BioMed research international.

[31]  Krin A. Kay,et al.  The implications of human metabolic network topology for disease comorbidity , 2008, Proceedings of the National Academy of Sciences.

[32]  L. Tosiello Hypomagnesemia and diabetes mellitus. A review of clinical implications. , 1996, Archives of internal medicine.

[33]  Reinhard Schneider,et al.  Using graph theory to analyze biological networks , 2011, BioData Mining.

[34]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[35]  Rosario M. Piro,et al.  Network medicine: linking disorders , 2012, Human Genetics.

[36]  Aaron M. Cohen,et al.  Research Paper: A System for Classifying Disease Comorbidity Status from Medical Discharge Summaries Using Automated Hotspot and Negated Concept Detection , 2009, J. Am. Medical Informatics Assoc..

[37]  M. Traber,et al.  Vitamin E revisited: do new data validate benefits for chronic disease prevention? , 2008, Current opinion in lipidology.