Clustering medical data to predict the likelihood of diseases

Several studies show that background knowledge of a domain can improve the results of clustering algorithms. In this paper, we illustrate how to use the background knowledge of medical domain in clustering process to predict the likelihood of diseases. To find the likelihood of diseases, clustering has to be done based on anticipated likelihood attributes with core attributes of disease in data point. To find the likelihood of diseases, we have proposed constraint k-Means-Mode clustering algorithm. Attributes of Medical data are both continuous and categorical. The developed algorithm can handle both continuous and discrete data and perform clustering based on anticipated likelihood attributes with core attributes of disease in data point. We have demonstrated its effectiveness by testing it for a real world patient data set.

[1]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[2]  Hong Chang,et al.  Locally linear metric adaptation for semi-supervised clustering , 2004, ICML.

[3]  Ohn Mar San,et al.  An alternative extension of the k-means algorithm for clustering categorical data , 2004 .

[4]  D. Scholar,et al.  Extraction of Significant Patterns from Heart Disease Warehouses for Heart Attack Prediction , 2009 .

[5]  Mahdieh Soleymani Baghshah,et al.  Kernel-based metric learning for semi-supervised clustering , 2010, Neurocomputing.

[6]  Ajith Abraham,et al.  Two Phase Semi-supervised Clustering Using Background Knowledge , 2006, IDEAL.

[7]  Torben Bach Pedersen,et al.  Research issues in clinical data warehousing , 1998, Proceedings. Tenth International Conference on Scientific and Statistical Database Management (Cat. No.98TB100243).

[8]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[9]  Jiadong Ren,et al.  A Hierarchical Clustering Algorithm Based on K-Means with Constraints , 2009, 2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC).

[10]  Claire Cardie,et al.  Constrained K-means Clustering with Background Knowledge , 2001, ICML.

[11]  Dit-Yan Yeung,et al.  Locally linear metric adaptation with application to semi-supervised clustering and image retrieval , 2006, Pattern Recognit..

[12]  Liping Cao,et al.  A novel semi-supervised fuzzy c-means clustering method , 2009, 2009 Chinese Control and Decision Conference.