An improved overlapping k-means clustering method for medical applications

The sensitivity of overlapping k-means algorithm to initialization is considered.The k-harmonic means method is effective for identifying initial cluster centroids.The proposed approach outperforms the original overlapping k-means algorithm. Data clustering has been proven to be an effective method for discovering structure in medical datasets. The majority of clustering algorithms produce exclusive clusters meaning that each sample can belong to one cluster only. However, most real-world medical datasets have inherently overlapping information, which could be best explained by overlapping clustering methods that allow one sample belong to more than one cluster. One of the simplest and most efficient overlapping clustering methods is known as overlapping k-means (OKM), which is an extension of the traditional k-means algorithm. Being an extension of the k-means algorithm, the OKM method also suffers from sensitivity to the initial cluster centroids. In this paper, we propose a hybrid method that combines k-harmonic means and overlapping k-means algorithms (KHM-OKM) to overcome this limitation. The main idea behind KHM-OKM method is to use the output of KHM method to initialize the cluster centers of OKM method. We have tested the proposed method using FBCubed metric, which has been shown to be the most effective measure to evaluate overlapping clustering algorithms regarding homogeneity, completeness, rag bag, and cluster size-quantity tradeoff. According to results from ten publicly available medical datasets, the KHM-OKM algorithm outperforms the original OKM algorithm and can be used as an efficient method for clustering medical datasets.

[1]  David M. Rapoport,et al.  Pompe disease diagnosis and management guideline , 2006, Genetics in Medicine.

[2]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[3]  D. Nagel,et al.  Cluster analysis in diagnosis. , 1992, Clinical chemistry.

[4]  João Manuel R. S. Tavares,et al.  A Review on the Current Segmentation Algorithms for Medical Images , 2009, IMAGAPP.

[5]  G. Naveen Sundar,et al.  Survey of Clustering Algorithms for Categorization of Patient Records in Healthcare , 2016 .

[6]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[7]  Fei Zhu,et al.  On Clustering Algorithms for Biological Data , 2013 .

[8]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[9]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[10]  Guillaume Cleuziou Two Variants of the OKM for Overlapping Clustering , 2009, EGC.

[11]  Yue Li,et al.  Herd Clustering: A synergistic data clustering approach using collective intelligence , 2014, Appl. Soft Comput..

[12]  Nadia Essoussi,et al.  Non-disjoint Cluster Analysis with Non-uniform Density , 2013, MIKE.

[13]  M. Yasodha,et al.  Clustering Algorithms for Biological Data - A Survey Approach , 2011 .

[14]  Jan Baumbach,et al.  Comparing the performance of biomedical clustering methods , 2015, Nature Methods.

[15]  Nadia Essoussi,et al.  Overview of Overlapping Partitional Clustering Methods , 2015 .

[16]  Nadia Essoussi,et al.  Kernel Overlapping K-Means for Clustering in Feature Space , 2010, KDIR.

[17]  José Francisco Martínez Trinidad,et al.  Study of Overlapping Clustering Algorithms Based on Kmeans through FBcubed Metric , 2014, MCPR.

[18]  Xiaogang Wang,et al.  A roadmap of clustering algorithms: finding a match for a biomedical application , 2008, Briefings Bioinform..

[19]  Abu Sayed Md. Latiful Hoque,et al.  Clustering medical data to predict the likelihood of diseases , 2010, 2010 Fifth International Conference on Digital Information Management (ICDIM).

[20]  Guillaume Cleuziou,et al.  An extended version of the k-means method for overlapping clustering , 2008, 2008 19th International Conference on Pattern Recognition.

[21]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2008, Information Retrieval.

[22]  Guillaume Cleuziou,et al.  A Generalization of k-Means for Overlapping Clustering , 2007 .

[23]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[24]  P. Kalyani,et al.  Approaches to Partition Medical Data using Clustering Algorithms , 2012 .

[25]  Xuxun Liu,et al.  A Survey on Clustering Routing Protocols in Wireless Sensor Networks , 2012, Sensors.

[26]  R. Bretzel,et al.  Comorbidity of diabetes mellitus and hypertension in the clinical setting: A review of prevalence, pathophysiology, and treatment perspectives , 2007 .

[27]  F. A. da Veiga,et al.  Structure discovery in medical databases: a conceptual clustering approach , 1996, Artif. Intell. Medicine.

[28]  Ka-Chun Wong,et al.  A Short Survey on Data Clustering Algorithms , 2015, 2015 Second International Conference on Soft Computing and Machine Intelligence (ISCMI).

[29]  Ameer Ahmed Abbasi,et al.  A survey on clustering algorithms for wireless sensor networks , 2007, Comput. Commun..

[30]  Rebecca Nugent,et al.  An overview of clustering applied to molecular biology. , 2010, Methods in molecular biology.

[31]  Nadia Essoussi,et al.  Overlapping Patterns Recognition with Linear and Non-Linear Separations using Positive Definite Kernels , 2012 .

[32]  Joseph N. Khamalah,et al.  Using Cluster Analysis for Medical Resource Decision Making , 1995, Medical decision making : an international journal of the Society for Medical Decision Making.

[33]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[34]  S. Dagogo-Jack,et al.  Comorbidities of Diabetes and Hypertension: Mechanisms and Approach to Target Organ Protection , 2011, Journal of clinical hypertension.

[35]  Rui Xu,et al.  Clustering Algorithms in Biomedical Research: A Review , 2010, IEEE Reviews in Biomedical Engineering.

[36]  Barry Byrne,et al.  Diagnostic challenges for Pompe disease: An under-recognized cause of floppy baby syndrome , 2006, Genetics in Medicine.