A two-phase K-means algorithm for large datasets

Abstract One of the drawbacks of the K-means algorithm is the need for several iterations over datasets before it converges on a solution. Therefore, its application is limited to relatively small datasets. This paper presents a scalable version of the K-means algorithm that employs a buffering technique. The new algorithm, Two-Phase K-means, can robustly find a good solution in only one iteration.