A Limited-Iteration Bisecting K-Means for Fast Clustering Large Datasets

Bisecting K-means (BKM) clustering, with or without refinement, has been shown to exhibit higher computing efficiency, better clustering quality, and low susceptibility to initial cluster centers, when compared with the basic K-means clustering algorithm. For bisecting K-means with refinement, in this paper, we investigate a variant that increases the efficiency while trying to maintain clustering quality. Our approach is to limit the number of iterations of the two-means (the K-means with K=2) in bisecting a data subset. We experimented with one, two, and three iterations for the two-means, and compared them with the original BKM's unlimited iterations which end when two clusters no longer change in the two-means. We carried out experimental studies on three datasets and found that three and unlimited iterations for the two-means produced almost the same clustering qualities on all test cases, leading us to think that three iterations might be adequate. The experimental data also show that the limited-iteration BKM with three iterations led to higher computing efficiency when compared with the BKM, suggesting that limiting the iterations in bisecting K-means has the potential of achieving higher efficiency while maintaining clustering quality.