A vectorized k-means algorithm for compressed datasets: design and experimental analysis

Clustering algorithms (i.e., Gaussian mixture models, k-means) tackle the problem of grouping a set of elements in such a way that elements from the same group (or cluster) have more similar properties to each other than to those elements in other clusters. This simple concept turns out to be the basis in complex algorithms from many application areas, including sequence analysis and genotyping in bioinformatics, medical imaging, antimicrobial activity, market research, social networking, etc. However, as the data volume continues to increase, the performance of clustering algorithms is heavily influenced by the memory subsystem. In this paper, we propose a novel and efficient implementation of Lloyd’s k-means clustering algorithm to substantially reduce data movement along the memory hierarchy. Our contributions are based on the fact that the vast majority of processors are equipped with powerful Single Instruction Multiple Data (SIMD) instructions that are, in most cases, underused. SIMD improves the CPU computational power and, if used wisely, can be seen as an opportunity to improve on the application data transfers by compressing/decompressing the data, specially for memory-bound applications. Our contributions include a SIMD-friendly data layout organization, in-register implementation of key functions and SIMD-based compression. We demonstrate that using our optimized SIMD-based compression method, it is possible to improve the performance and energy of k-means by a factor of 4.5x and 8.7x, respectively, for a i7 Haswell machine, and 22x and 22.2x for Xeon Phi: KNL, running a single thread.

[1]  Greg Hamerly,et al.  Making k-means Even Faster , 2010, SDM.

[2]  Jack J. Dongarra,et al.  A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..

[3]  Johan A. K. Suykens,et al.  Representative subsets for big data learning using k-NN graphs , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[4]  Lizhong Xiao,et al.  K-means Algorithm Based on Particle Swarm Optimization Algorithm for Anomaly Intrusion Detection , 2006, 2006 6th World Congress on Intelligent Control and Automation.

[5]  Keqiu Li,et al.  Optimized big data K-means clustering using MapReduce , 2014, The Journal of Supercomputing.

[6]  Jing Wang,et al.  Fast approximate k-means via cluster closures , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Kevin Skadron,et al.  A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..

[8]  Saeed Shahrivari,et al.  High performance parallel $$k$$k-means clustering for disk-resident datasets on multi-core CPUs , 2014, The Journal of Supercomputing.

[9]  Lasse Natvig,et al.  V-PFORDelta: Data Compression for Energy Efficient Computation of Time Series , 2015, 2015 IEEE 22nd International Conference on High Performance Computing (HiPC).

[10]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[11]  Know-Center Inffeldgasse,et al.  K-Means on the Graphics Processor : Design And Experimental Analysis , 2010 .

[12]  Qingbo Wu,et al.  A Vectorized K-Means Algorithm for Intel Many Integrated Core Architecture , 2013, APPT.

[13]  Juby Mathew,et al.  Enhancement of Parallel K-Means algorithm , 2015, 2015 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS).

[14]  Raghvendra Mall Sparsity in Large Scale Kernel Models , 2015 .

[15]  Jeffrey S. Vetter,et al.  A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems , 2016 .

[16]  Christoforos E. Kozyrakis,et al.  Models and Metrics to Enable Energy-Efficiency Optimizations , 2007, Computer.

[17]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[19]  Leonid Boytsov,et al.  SIMD compression and the intersection of sorted integers , 2014, Softw. Pract. Exp..

[20]  Alejandro Duran,et al.  The Design of OpenMP Tasks , 2009, IEEE Transactions on Parallel and Distributed Systems.

[21]  Hamid Ravaee Finding Protein Complexes via Fuzzy Learning Vector Quantization Algorithm , 2012 .

[22]  Heng Tao Shen,et al.  Optimized Cartesian K-Means , 2014, IEEE Transactions on Knowledge and Data Engineering.