Two Modifications of Yinyang K-means Algorithm

In the paper a very fast algorithm for K-means clustering problem, called Yinyang K-means, is considered. The algorithm uses initial grouping of cluster centroids and the triangle inequality to avoid unnecessary distance calculations. We propose two modifications of Yinyang K-means: regrouping of cluster centroids during the run of the algorithm and replacement of the grouping procedure with a method, which generates the groups of equal sizes. The influence of these two modifications on the efficiency of Yinyang K-means is experimentally evaluated using seven datasets. The results indicate that new grouping procedure reduces runtime of the algorithm. For one of tested datasets it runs up to 2.8 times faster.

[1]  Charles Elkan,et al.  Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[2]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[3]  Markus Kächele,et al.  Speeding up k-means by approximating Euclidean distances via block vectors , 2016, ICML.

[4]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[5]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[6]  Yue Zhao,et al.  Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup , 2015, ICML.

[7]  Ranjan Maitra,et al.  Simulating Data to Study Performance of Finite Mixture Modeling and Clustering Algorithms , 2010 .

[8]  Pierre Hansen,et al.  NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.

[9]  Jing Wang,et al.  Fast approximate k-means via cluster closures , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Hongbin Zha,et al.  Trinary-Projection Trees for Approximate Nearest Neighbor Search , 2014, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[13]  Andrew W. Moore,et al.  Accelerating exact k-means algorithms with geometric reasoning , 1999, KDD '99.

[14]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.