论文信息 - A Limited-Iteration Bisecting K-Means for Fast Clustering Large Datasets

A Limited-Iteration Bisecting K-Means for Fast Clustering Large Datasets

Bisecting K-means (BKM) clustering, with or without refinement, has been shown to exhibit higher computing efficiency, better clustering quality, and low susceptibility to initial cluster centers, when compared with the basic K-means clustering algorithm. For bisecting K-means with refinement, in this paper, we investigate a variant that increases the efficiency while trying to maintain clustering quality. Our approach is to limit the number of iterations of the two-means (the K-means with K=2) in bisecting a data subset. We experimented with one, two, and three iterations for the two-means, and compared them with the original BKM's unlimited iterations which end when two clusters no longer change in the two-means. We carried out experimental studies on three datasets and found that three and unlimited iterations for the two-means produced almost the same clustering qualities on all test cases, leading us to think that three iterations might be adequate. The experimental data also show that the limited-iteration BKM with three iterations led to higher computing efficiency when compared with the BKM, suggesting that limiting the iterations in bisecting K-means has the potential of achieving higher efficiency while maintaining clustering quality.

[1] Andrew W. Moore,et al. Accelerating exact k-means algorithms with geometric reasoning , 1999, KDD '99.

[2] Charles Elkan,et al. Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[3] D.M. Mount,et al. An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4] Andrew W. Moore,et al. The Anchors Hierachy: Using the triangle inequality to survive high dimensional data , 2013, ArXiv.

[5] Anil K. Jain. Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[6] Anil K. Jain,et al. A spatial filtering approach to texture analysis , 1985, Pattern Recognit. Lett..

[7] Billur Barshan,et al. Recognizing Daily and Sports Activities in Two Open Source Machine Learning Environments Using Body-Worn Sensor Units , 2014, Comput. J..

[8] George Karypis,et al. A Comparison of Document Clustering Techniques , 2000 .

[9] Billur Barshan,et al. Comparative study on classifying human activities with miniature inertial and magnetic sensors , 2010, Pattern Recognit..

[10] Steven J. Phillips. Acceleration of K-Means and Related Clustering Algorithms , 2002, ALENEX.

[11] Greg Hamerly,et al. Making k-means Even Faster , 2010, SDM.

[12] Billur Barshan,et al. Human Activity Recognition Using Inertial/Magnetic Sensor Units , 2010, HBU.