Accelerated K-Means Algorithms for Low-Dimensional Data on Parallel Shared-Memory Systems

This paper considers the problem of exact accelerated algorithms for the K-means clustering of low-dimensional data on modern multi-core systems. A version of the filtering algorithm parallelized using the OpenMP (Open Multi-Processing) standard is proposed. The algorithm employs a kd-tree structure to skip some unnecessary calculations between cluster centroids and feature vectors. In our approach, both the kd-tree construction and the iterations of the K-means are parallelized using the OpenMP tasking mechanism. A new task is created for a recursive call performed during kd-tree construction and traversal. The tasks are executed in parallel by the cores of a shared-memory system. In computational experiments, we evaluated the parallel efficiency of our approach and compared its performance to the parallel Lloyd’s method, a GPU (Graphics Processing Unit) formulation of the K-means algorithm, and two parallel triangle inequality-based algorithms intended for low-dimensional data. The evaluation was performed on six synthetic datasets from two distributions and seven real-life datasets. The experiments, executed on a 24-core system, indicated that our version of the filtering algorithm had satisfactory or high parallel efficiency. Its runtime was much shorter than those of competing algorithms. However, the advantage of the parallel filtering algorithm decreased rapidly as the dimension of data increased.

[1]  Olli Nevalainen,et al.  A fast exact GLA based on code vector activity detection , 2000, IEEE Trans. Image Process..

[2]  Emmett Kilgariff,et al.  Fermi GF100 GPU Architecture , 2011, IEEE Micro.

[3]  Sami Sieranoja,et al.  How much can k-means be improved by using better initialization and repeats? , 2019, Pattern Recognit..

[4]  Shokri Z. Selim,et al.  K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Robert M. Gray,et al.  An Improvement of the Minimum Distortion Encoding Algorithm for Vector Quantization , 1985, IEEE Trans. Commun..

[6]  Charles Elkan,et al.  Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[7]  Jonathan Drake,et al.  Accelerated k-means with adaptive distance bounds , 2012 .

[8]  Rafael Radkowski,et al.  Parallel kd-Tree Construction on the GPU with an Adaptive Split and Sort Strategy , 2018, International Journal of Parallel Programming.

[9]  Martin Kruliš,et al.  Detailed Analysis and Optimization of CUDA K-means Algorithm , 2020, ICPP.

[10]  Richard A. Regueiro,et al.  Superlinear speedup phenomenon in parallel 3D Discrete Element Method (DEM) simulations of complex-shaped particles , 2018, Parallel Comput..

[11]  G. Weatherill,et al.  Delineation of shallow seismic source zones using K-means cluster analysis, with application to the Aegean region , 2009 .

[12]  Kunle Olukotun,et al.  The Future of Microprocessors , 2005, ACM Queue.

[13]  Sanjay Ranka,et al.  An effic ient k-means clustering algorithm , 1997 .

[14]  Ruoming Jin,et al.  Shared memory parallelization of data mining algorithms: techniques, programming interface, and performance , 2005, IEEE Transactions on Knowledge and Data Engineering.

[15]  Inderjit S. Dhillon,et al.  A Data-Clustering Algorithm on Distributed Memory Multiprocessors , 1999, Large-Scale Parallel Data Mining.

[16]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[17]  Yue Zhao,et al.  Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup , 2015, ICML.

[18]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[19]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[20]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[21]  Jordi Petit,et al.  Parallel Partition Revisited , 2008, WEA.

[22]  Wojciech Kwedlo,et al.  An OpenMP Parallelization of the K-means Algorithm Accelerated Using KD-trees , 2019, PPAM.

[23]  Sergei Vassilvitskii,et al.  Scalable K-Means++ , 2012, Proc. VLDB Endow..

[24]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[25]  Greg Hamerly,et al.  Making k-means Even Faster , 2010, SDM.

[26]  I. Selim,et al.  Open cluster membership probability based on K-means clustering algorithm , 2016 .

[27]  Guangwen Yang,et al.  Large-Scale Hierarchical k-means for Heterogeneous Many-Core Supercomputers , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[28]  Wojciech Kwedlo,et al.  A Hybrid MPI/OpenMP Parallelization of $K$ -Means Algorithms Accelerated Using the Triangle Inequality , 2019, IEEE Access.

[29]  Giuseppe Di Fatta,et al.  Dynamic Load Balancing in Parallel KD-Tree k-Means , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[30]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[31]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  George A. Constantinides,et al.  FPGA-based K-means clustering using tree-based data structures , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[33]  M. Emre Celebi,et al.  Improving the performance of k-means for color quantization , 2011, Image Vis. Comput..

[34]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[35]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[36]  Greg Hamerly,et al.  Accelerating Lloyd’s Algorithm for k -Means Clustering , 2015 .

[37]  Michael Granitzer,et al.  Accelerating K-Means on the Graphics Processor via CUDA , 2009, 2009 First International Conference on Intensive Applications and Services.

[38]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[39]  E. Forgy Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[40]  Jing Wang,et al.  Fast approximate k-means via cluster closures , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  David Pettinger,et al.  Scalability of efficient parallel K-Means , 2009, 2009 5th IEEE International Conference on E-Science Workshops.

[42]  Ranjan Maitra,et al.  Simulating Data to Study Performance of Finite Mixture Modeling and Clustering Algorithms , 2010 .

[43]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Robert A. van de Geijn,et al.  Collective communication: theory, practice, and experience , 2007, Concurr. Comput. Pract. Exp..

[45]  Paul Chow,et al.  K-means implementation on FPGA for high-dimensional data using triangle inequality , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).

[46]  Christian Böhm,et al.  Multi-core K-means , 2017, SDM.

[47]  Chao Yang,et al.  Enabling Highly Efficient k-Means Computations on the SW26010 Many-Core Processor of Sunway TaihuLight , 2019, Journal of Computer Science and Technology.

[48]  Henrique C. Freitas,et al.  Parallel and distributed kmeans to identify the translation initiation site of proteins , 2012, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[49]  Salvatore Cuomo,et al.  A GPU-accelerated parallel K-means algorithm , 2017, Comput. Electr. Eng..

[50]  François Fleuret,et al.  Fast k-means with accurate bounds , 2016, ICML.

[51]  Alejandro Duran,et al.  The Design of OpenMP Tasks , 2009, IEEE Transactions on Parallel and Distributed Systems.

[52]  Pierre Hansen,et al.  NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.

[53]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[54]  Stephen Taylor,et al.  A Practical Approach to Dynamic Load Balancing , 1998, IEEE Trans. Parallel Distributed Syst..

[55]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[56]  Meichun Hsu,et al.  Clustering billions of data points using GPUs , 2009, UCHPC-MAW '09.

[57]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[58]  R. Beaton,et al.  Identifying Sagittarius Stream Stars by Their APOGEE Chemical Abundance Signatures , 2019, The Astrophysical Journal.

[59]  Tommi Kärkkäinen,et al.  Scalable Initialization Methods for Large-Scale Clustering , 2020, ArXiv.

[60]  S. Ra,et al.  A fast mean-distance-ordered partial codebook search algorithm for image vector quantization , 1993 .

[61]  Vijay S Pande,et al.  K-Means for Parallel Architectures Using All-Prefix-Sum Sorting and Updating Steps , 2013, IEEE Transactions on Parallel and Distributed Systems.

[62]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[63]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[64]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[65]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[66]  Andrew W. Moore,et al.  Accelerating exact k-means algorithms with geometric reasoning , 1999, KDD '99.