A Hybrid MPI/OpenMP Parallelization of $K$ -Means Algorithms Accelerated Using the Triangle Inequality

The standard formulation of the <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>-means clustering (Lloyd’s method) performs many unnecessary distance calculations. In this paper, we focus on four approaches that use the triangle inequality to avoid unnecessary distance calculations. These approaches are Drake’s, Elkan’s, Annulus, and Yinyang algorithms. We propose a hybrid MPI/OpenMP parallelization of these algorithms in which the dataset and the corresponding data structures storing bounds on distances are evenly divided among MPI processes. Then, in the assignment step of a <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>-means iteration, each MPI process computes the assignment of its portion of data using OpenMP threads. In the update step of the iteration, the cluster centroids are computed using a hierarchical all-reduce operation. In the computational experiments, we compared the strong scalability of these four algorithms with the scalability of Lloyd’s algorithm, parallelized using the same approach. The results indicate that all four algorithms maintain an advantage in computing time over Lloyd’s algorithm. A comparison with two software packages, whose sources are publicly available, in the same computing environment, shows that our implementations are more efficient.

[1]  Paul Chow,et al.  K-means implementation on FPGA for high-dimensional data using triangle inequality , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).

[2]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[3]  Olli Nevalainen,et al.  A fast exact GLA based on code vector activity detection , 2000, IEEE Trans. Image Process..

[4]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[5]  Sergei Vassilvitskii,et al.  Scalable K-Means++ , 2012, Proc. VLDB Endow..

[6]  Robert A. van de Geijn,et al.  Collective communication: theory, practice, and experience , 2007, Concurr. Comput. Pract. Exp..

[7]  Jing Wang,et al.  Fast approximate k-means via cluster closures , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[9]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[10]  Christian Böhm,et al.  Multi-core K-means , 2017, SDM.

[11]  Charles Elkan,et al.  Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[12]  George A. Constantinides,et al.  FPGA-based K-means clustering using tree-based data structures , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[13]  Markus Kächele,et al.  Speeding up k-means by approximating Euclidean distances via block vectors , 2016, ICML.

[14]  David Pettinger,et al.  Scalability of efficient parallel K-Means , 2009, 2009 5th IEEE International Conference on E-Science Workshops.

[15]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[16]  Patricio A. Vela,et al.  A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm , 2012, Expert Syst. Appl..

[17]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[18]  François Fleuret,et al.  Fast k-means with accurate bounds , 2016, ICML.

[19]  Steven J. Phillips Acceleration of K-Means and Related Clustering Algorithms , 2002, ALENEX.

[20]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[21]  Wojciech Kwedlo,et al.  Two Modifications of Yinyang K-means Algorithm , 2017, ICAISC.

[22]  William B. March,et al.  MLPACK: a scalable C++ machine learning library , 2012, J. Mach. Learn. Res..

[23]  Georg Hager,et al.  Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[24]  Jiming Liu,et al.  Speeding up K-Means Algorithm by GPUs , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[25]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[26]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[27]  Muthu Dayalan,et al.  MapReduce : Simplified Data Processing on Large Cluster , 2018 .

[28]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[29]  Jonathan Drake Faster k-means clustering. , 2013 .

[30]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[31]  E. Forgy Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[32]  Andrew W. Moore,et al.  Accelerating exact k-means algorithms with geometric reasoning , 1999, KDD '99.

[33]  Henrique C. Freitas,et al.  Parallel and distributed kmeans to identify the translation initiation site of proteins , 2012, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[34]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[35]  Reynold Xin,et al.  Apache Spark , 2016 .

[36]  Yue Zhao,et al.  Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup , 2015, ICML.

[37]  Guangwen Yang,et al.  Large-Scale Hierarchical k-means for Heterogeneous Many-Core Supercomputers , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[38]  Torsten Hoefler,et al.  Scientific Benchmarking of Parallel Computing Systems Twelve ways to tell the masses when reporting performance results , 2017 .

[39]  Pierre Hansen,et al.  NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.

[40]  Shokri Z. Selim,et al.  K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Giuseppe Di Fatta,et al.  Dynamic Load Balancing in Parallel KD-Tree k-Means , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[42]  DAVID P. HELMBOLD,et al.  Modeling Speedup (n) Greater than n , 1990, IEEE Trans. Parallel Distributed Syst..

[43]  Davide Anguita,et al.  Big Data Analytics in the Cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf , 2015, INNS Conference on Big Data.

[44]  Rupak Biswas,et al.  High performance computing using MPI and OpenMP on multi-core parallel systems , 2011, Parallel Comput..

[45]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[46]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Jonathan Drake,et al.  Accelerated k-means with adaptive distance bounds , 2012 .

[48]  Katherine E. Isaacs,et al.  There goes the neighborhood: Performance degradation due to nearby jobs , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[49]  Cynthia A. Phillips,et al.  k-Means Clustering on Two-Level Memory Systems , 2015, MEMSYS.

[50]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Chao Yang,et al.  Enabling Highly Efficient k-Means Computations on the SW26010 Many-Core Processor of Sunway TaihuLight , 2019, Journal of Computer Science and Technology.

[52]  Greg Hamerly,et al.  Making k-means Even Faster , 2010, SDM.

[53]  Vijay S Pande,et al.  K-Means for Parallel Architectures Using All-Prefix-Sum Sorting and Updating Steps , 2013, IEEE Transactions on Parallel and Distributed Systems.

[54]  Greg Hamerly,et al.  Accelerating Lloyd’s Algorithm for k -Means Clustering , 2015 .

[55]  Inderjit S. Dhillon,et al.  A Data-Clustering Algorithm on Distributed Memory Multiprocessors , 1999, Large-Scale Parallel Data Mining.

[56]  Naiyan Wang,et al.  Trinary-Projection Trees for Approximate Nearest Neighbor Search , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.