An Approach to Determine the Number of Clusters for Clustering Algorithms

When clustering a dataset, the right number k of clusters is not often obvious. And choosing k automatically is a complex problem. This paper first reviews existing methods for selecting the number of clusters for the algorithm. Then, an improved algorithm is presented for learning k while clustering. The algorithm is based on coefficients α, β that affect this selection. Meanwhile, a new measure is suggested to confirm the member of clusters. Finally, we evaluate the computational complexity of the algorithm, apply to real datasets and results show its efficiency.

[1]  Wesam M. Ashour,et al.  Initializing K-Means Clustering Algorithm using Statistical Information , 2011 .

[2]  Sai Leela Enhancing K-Means Clustering Algorithm , 2011 .

[3]  D. Pham,et al.  Selection of K in K-means clustering , 2005 .

[4]  Fuchun Sun,et al.  Fuzzy Clustering with Novel Separable Criterion , 2006 .

[5]  Jian Yu,et al.  A novel fuzzy clustering algorithm , 2003, Proceedings 2003 IEEE International Symposium on Computational Intelligence in Robotics and Automation. Computational Intelligence in Robotics and Automation for the New Millennium (Cat. No.03EX694).

[6]  Haizhou Wang,et al.  Ckmeans.1d.dp: Optimal k-means Clustering in One Dimension by Dynamic Programming , 2011, R J..