论文信息 - An Empirical Evaluation of Different Initializations on the Number of K-Means Iterations

An Empirical Evaluation of Different Initializations on the Number of K-Means Iterations

This paper presents an analysis of the number of iterations K-Means takes to converge under different initializations. We have experimented with seven initialization algorithms in a total of 37 real and synthetic datasets. We have found that hierarchical-based initializations tend to be most effective at reducing the number of iterations, especially a divisive algorithm using the Ward criterion when applied to real datasets.

Renato Cordeiro de Amorim | R. C. D. Amorim

[1] G. W. Milligan,et al. The validation of four ultrametric clustering algorithms , 1980, Pattern Recognit..

[2] Renato Cordeiro de Amorim,et al. Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering , 2012, Pattern Recognit..

[3] Sankar K. Pal,et al. Pattern Recognition Algorithms for Data Mining: Scalability, Knowledge Discovery, and Soft Granular Computing , 2004 .

[4] Boris Mirkin,et al. Mathematical Classification and Clustering , 1996 .

[5] D. M. Hutton,et al. Pattern Recognition Algorithms for Data Mining , 2005 .

[6] David G. Stork,et al. Pattern Classification , 1973 .

[7] J. A. Hartigan,et al. A k-means clustering algorithm , 1979 .

[8] David G. Stork,et al. Pattern Classification (2nd ed.) , 1999 .

[9] Boris Mirkin,et al. Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science) , 2005 .

[10] Anil K. Jain. Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[11] J. H. Ward. Hierarchical Grouping to Optimize an Objective Function , 1963 .