A Simple Density with Distance Based Initial Seed Selection Technique for K Means Algorithm

Open issues with respect to K means algorithm are identifying the number of clusters, initial seed concept selection, clustering tendency, handling empty clusters, identifying outliers etc. In this paper we propose a novel and a simple technique considering both density and distance of the concepts in a dataset to identify initial seed concepts for clustering. Many authors have proposed different techniques to identify initial seed concepts; but our method ensures that the initial seed concepts are chosen from different clusters that are to be generated by the clustering solution. The hallmark of our algorithm is that it is a single pass algorithm that does not require any extra parameters to be estimated. Further, our seed concepts are one among the actual concepts and not the mean of representative concepts as is the case in many other algorithms. We have implemented our proposed algorithm and compared the results with the interval based technique of Fouad Khan. We see that our method outperforms the interval based method. We have also compared our method with the original random K means and K Means++ algorithms.

[1]  Michelle Effros,et al.  Deterministic clustering with data nets , 2004, Electron. Colloquium Comput. Complex..

[2]  Sergei Vassilvitskii,et al.  How slow is the k-means method? , 2006, SCG '06.

[3]  D. Edla,et al.  Enhanced K-Means Clustering Algorithm using A Heuristic Approach , 2014 .

[4]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[5]  Jiye Liang,et al.  An initialization method for the K-Means algorithm using neighborhood model , 2009, Comput. Math. Appl..

[6]  Md Zahidul Islam,et al.  DenClust: A Density Based Seed Selection Approach for K-Means , 2014, ICAISC.

[7]  Bidyut Baran Chaudhuri,et al.  A novel multiseed nonhierarchical data clustering technique , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[8]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[9]  M M Astrahan SPEECH ANALYSIS BY CLUSTERING, OR THE HYPERPHONEME METHOD , 1970 .

[10]  Md Zahidul Islam,et al.  ModEx and Seed-Detective: Two novel techniques for high quality clustering by using good initial seeds in K-Means , 2015, J. King Saud Univ. Comput. Inf. Sci..

[11]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[12]  Fouad Khan,et al.  An initial seed selection algorithm for k-means clustering of georeferenced data to improve replicability of cluster assignments for mapping application , 2016, Appl. Soft Comput..

[13]  M. Elter,et al.  An automatic histogram-based initializing algorithm for K-means clustering in CT , 2013 .

[14]  K. Karteeka Pavan,et al.  Single Pass Seed Selection Algorithm for k-Means , 2010 .