A two-phase K-means algorithm for large datasets
暂无分享,去创建一个
Abstract One of the drawbacks of the K-means algorithm is the need for several iterations over datasets before it converges on a solution. Therefore, its application is limited to relatively small datasets. This paper presents a scalable version of the K-means algorithm that employs a buffering technique. The new algorithm, Two-Phase K-means, can robustly find a good solution in only one iteration.
[1] D. Pham,et al. An Incremental K-means algorithm , 2004 .
[2] Charles Elkan,et al. Scalability for clustering algorithms revisited , 2000, SKDD.
[3] Paul S. Bradley,et al. Scaling Clustering Algorithms to Large Databases , 1998, KDD.
[4] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .