Efficient genetic K-Means clustering for health care knowledge discovery

Data mining and machine learning are becoming the most interesting research areas and increasingly popular in health organizations. The hidden patterns among patients data can be extracted by applying data mining. The techniques and tools of data mining are very helpful as they provide health care professionals with significant knowledge toward a decision. Researchers have shown several utilities of data mining techniques such as clustering, classification, and regression in health care domain. Particularly, clustering algorithms which help researchers discover new insights by segmenting patients and providing them with effective treatments. This paper, reviews existing methods of clustering and present an efficient K-Means clustering algorithm which uses Self Organizing Map (SOM) method to overcome the problem of finding number of centroids in traditional K-Means. The SOM based clustering is very efficient due to its unsupervised learning and topology preserving properties. Two-staged clustering algorithm uses SOM to produce the prototypes in the first stage and then use those prototypes to create clusters in the second stage. Two health care datasets are used in the proposed experiments and a cluster accuracy metric was applied to evaluate the performance of the algorithm. Our analysis shows that the proposed method is accurate and shows better clustering performance along with valuable insights for each cluster. Our approach is unsupervised, scalable and can be applied to various domains.

[1]  Concha Bielza,et al.  Unveiling relevant non-motor Parkinson's disease severity symptoms using a machine learning approach , 2013, Artif. Intell. Medicine.

[2]  Pedro Abreu,et al.  Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values , 2015, Comput. Biol. Medicine.

[3]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[4]  Divya Tomar,et al.  A survey on Data Mining approaches for Healthcare , 2013, BSBT 2013.

[5]  H. Koh,et al.  Data mining applications in healthcare. , 2005, Journal of healthcare information management : JHIM.

[6]  Sungzoon Cho,et al.  An efficient and effective ensemble of support vector machines for anti-diabetic drug failure prediction , 2015, Expert Syst. Appl..

[7]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[8]  Edgar E. Vallejo,et al.  A Clustering Genetic Algorithm for Genomic Data Mining , 2009, Foundations of Computational Intelligence.

[9]  Mary K Obenshain Application of Data Mining Techniques to Healthcare Data , 2004, Infection Control & Hospital Epidemiology.

[10]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[11]  Filipe Portela,et al.  A Clustering Approach for Predicting Readmissions in Intensive Medicine , 2014 .

[12]  John J. Mentel,et al.  Patient note deidentification using a find-and-replace iterative process. , 2005, Journal of healthcare information management : JHIM.

[13]  Luc De Raedt,et al.  Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds , 2004, J. Chem. Inf. Model..

[14]  Patricio A. Vela,et al.  A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm , 2012, Expert Syst. Appl..

[15]  John M. Hancock k-Means Clustering , 2004 .

[16]  Jianqiang Li,et al.  Emerging information technologies for enhanced healthcare , 2015, Comput. Ind..

[17]  Michael I. Jordan,et al.  Revisiting k-means: New Algorithms via Bayesian Nonparametrics , 2011, ICML.

[18]  Taysir Hassan A. Soliman,et al.  A gene selection approach for classifying diseases based on microarray datasets , 2010, 2010 2nd International Conference on Computer Technology and Development.

[19]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[20]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[21]  A HaratyRamzi,et al.  An enhanced k-means clustering algorithm for pattern discovery in healthcare data , 2015 .

[22]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[23]  Rajeev Srivastava,et al.  k-means Based Document Clustering with Automatic "k" Selection and Cluster Refinement , 2014 .

[24]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[25]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[26]  Michael Biehl,et al.  Advances in Self-Organizing Maps and Learning Vector Quantization , 2014 .

[27]  Sang Won Yoon,et al.  Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms , 2014, Expert Syst. Appl..

[28]  Pa-Chun Wang,et al.  Data Mining Techniques for Assisting the Diagnosis of Pressure Ulcer Development in Surgical Patients , 2012, Journal of Medical Systems.

[29]  Julien Subercaze,et al.  Knowledge Management in Healthcare , 2010 .

[30]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[31]  Wahidah Husain,et al.  Data Mining in Healthcare – A Review , 2015 .

[32]  Bernard C. Jiang,et al.  Application of classification techniques on development an early-warning system for chronic illnesses , 2012, Expert Syst. Appl..

[33]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[34]  John M. Hancock,et al.  K -Means Clustering. , 2010 .