论文信息 - A two-phase K-means algorithm for large datasets

A two-phase K-means algorithm for large datasets

Abstract One of the drawbacks of the K-means algorithm is the need for several iterations over datasets before it converges on a solution. Therefore, its application is limited to relatively small datasets. This paper presents a scalable version of the K-means algorithm that employs a buffering technique. The new algorithm, Two-Phase K-means, can robustly find a good solution in only one iteration.

Duc Truong Pham | Stefan Simeonov Dimov | C. D. Nguyen

[1] D. Pham,et al. An Incremental K-means algorithm , 2004 .

[2] Charles Elkan,et al. Scalability for clustering algorithms revisited , 2000, SKDD.

[3] Paul S. Bradley,et al. Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[4] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .