Cross-Network Clustering and Cluster Ranking for Medical Diagnosis

Automating medical diagnosis is an important data mining problem, which is to infer likely disease(s) for some observed symptoms. Algorithms to the problem are very beneficial as a supplement to a real diagnosis. Existing diagnosis methods typically perform the inference on a sparse bipartite graph with two sets of nodes representing diseases and symptoms, respectively. By using this graph, existing methods basically assume no direct dependency exists between diseases (or symptoms), which may not be true in reality. To address this limitation, in this paper, we introduce two domain networks encoding similarities between diseases and those between symptoms to avoid information loss as well as to alleviate the sparsity problem of the bipartite graph. Based on the domain networks and the bipartite graph bridging them, we develop a novel algorithm, CCCR, to perform diagnosis by ranking symptom-disease clusters. Comparing with existing approaches, CCCR is more accurate, and more interpretable since its results deliver rich information about how the inferred diseases are categorized. Experimental results on real-life datasets demonstrate the effectiveness of the proposed method.

[1]  Gabriel Vasile,et al.  Bayesian network model for diagnosis of psychiatric diseases , 2009, Proceedings of the ITI 2009 31st International Conference on Information Technology Interfaces.

[2]  A. Barabasi,et al.  Human symptoms–disease network , 2014, Nature Communications.

[3]  C. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and K-means - Spectral Clustering , 2005 .

[4]  Hanghang Tong,et al.  Flexible and Robust Multi-Network Clustering , 2015, KDD.

[5]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Rui Liu,et al.  Robust Multi-Network Clustering via Joint Cross-Domain Cluster Alignment , 2015, 2015 IEEE International Conference on Data Mining.

[7]  Xiang Zhang,et al.  Drug repositioning by integrating target information through a heterogeneous network model , 2014, Bioinform..

[8]  Erkki Oja,et al.  Clustering by Low-Rank Doubly Stochastic Matrix Decomposition , 2012, ICML.

[9]  David Heckerman,et al.  A Tractable Inference Algorithm for Diagnosing Multiple Diseases , 2013, UAI.

[10]  Wei Cheng,et al.  Self-Grouping Multi-network Clustering , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).