A fast k-means algorithm based on multi-granularity

The k-means algorithm has been widely used since it was proposed, but the standard k-means algorithm does not perform well in terms of efficiency when dealing with large-scale data. To solve this problem, in this paper, we propose a fast kmeans algorithm based on multiple granularities. First, from the coarse-grained perspective, we use the clustering distribution information to narrow the search range of sample points, which makes the proposed algorithm very advantageous on large k. Second, from the fine-grained perspective, we use the rules of upper and lower bounds to reduce the number of sample points involved in the distance calculation, thus reducing many unnecessary distance calculations. Finally, we evaluate the proposed k-means algorithm on several real-world datasets, and the experimental results show that the proposed algorithm converges hundreds of times faster than standard k-means on average with the accuracy loss controlled at about three percent, and the speedup of the algorithm is more obvious when the dataset size is larger and the dimensionality of the dataset is higher.

[1]  Shuyin Xia,et al.  Fast k-means Clustering Based on the Neighbor Information , 2021, ISEEIE.

[2]  Deyu Meng,et al.  A Fast Adaptive k-means with No Bounds. , 2020, IEEE transactions on pattern analysis and machine intelligence.

[3]  Guoyin Wang,et al.  GBNRS: A Novel Rough Set Algorithm for Fast Adaptive Attribute Reduction in Classification , 2020, IEEE Transactions on Knowledge and Data Engineering.

[4]  Guoyin Wang,et al.  Granular ball computing classifiers for efficient, scalable and robust learning , 2019, Inf. Sci..

[5]  Alicia Martínez-Rebollar,et al.  The K-Means Algorithm Evolution , 2019, Introduction to Data Science and Machine Learning.

[6]  Siddhesh Khandelwal,et al.  Faster K-Means Cluster Estimation , 2017, ECIR.

[7]  François Fleuret,et al.  Fast k-means with accurate bounds , 2016, ICML.

[8]  Guanghua Zhang,et al.  Location difference of multiple distances based k-nearest neighbors algorithm , 2015, Knowl. Based Syst..

[9]  Yue Zhao,et al.  Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup , 2015, ICML.

[10]  Joaquín Pérez Ortega,et al.  Early Classification: A New Heuristic to Improve the Classification Step of K-Means , 2013, SBBD.

[11]  Abdel-Badeeh M. Salem,et al.  An efficient enhanced k-means clustering algorithm , 2006 .

[12]  Charles Elkan,et al.  Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[13]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[14]  Randall Davis Knowledge-Based Systems , 1986, Science.

[15]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[16]  Greg Hamerly,et al.  Making k-means Even Faster , 2010, SDM.

[17]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .