Speeding up K-Means Algorithm by GPUs

Cluster analysis plays a critical role in a wide variety of applications, but it is now facing the computational challenge due to the continuously increasing data volume. Parallel computing is one of the most promising solutions to overcoming the computational challenge. In this paper, we target at parallelizing k-Means, which is one of the most popular clustering algorithms, by using the widely available Graphics Processing Units (GPUs). Different from existing GPU-based k-Means algorithms, we observe that data dimension is an important factor that should be taken into consideration when parallelizing k-Means on GPUs. In particular, we use two different strategies for low-dimensional data sets and high-dimensional data sets respectively, in order to make the best use of the power of GPUs. For low-dimensional data sets, we exploit GPU on-chip registers to significantly decrease data access latency. For high-dimensional data sets, we design a novel algorithm which simulates matrix multiplication and exploits GPU on-chip registers and also on-chip shared memory to achieve high compute-to-memory-access ratio. As a result, our GPU-based k-Means algorithm is three to eight times faster than the best reported GPU-based algorithm.

[1]  Inderjit S. Dhillon,et al.  A Data-Clustering Algorithm on Distributed Memory Multiprocessors , 1999, Large-Scale Parallel Data Mining.

[2]  A. Choudhary,et al.  Nu-minebench 2.0 , 2005 .

[3]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[4]  Xiaowen Chu,et al.  Massively Parallel Network Coding on GPUs , 2008, 2008 IEEE International Performance, Computing and Communications Conference.

[5]  S.A. Manavski,et al.  CUDA Compatible GPU as an Efficient Hardware Accelerator for AES Cryptography , 2007, 2007 IEEE International Conference on Signal Processing and Communications.

[6]  Bingsheng He,et al.  Parallel Data Mining on Graphics Processors , 2011 .

[7]  Hong Zhou,et al.  Accurate integration of multi-view range images using k-means clustering , 2008, Pattern Recognit..

[8]  Xiaowen Chu,et al.  Practical Random Linear Network Coding on GPUs , 2009, Networking.

[9]  Meichun Hsu,et al.  Clustering billions of data points using GPUs , 2009, UCHPC-MAW '09.

[10]  Miriam Leeser,et al.  K-means Clustering for Multispectral Images Using Floating-Point Divide , 2007, 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2007).

[11]  Anil K. Jain,et al.  Large-scale parallel data clustering , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[12]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[13]  J. Kulpa,et al.  Time-frequency analysis using NVIDIA compute unified device architecture (CUDA) , 2009, Symposium on Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments (WILGA).

[14]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[15]  Jean-Jacques Quisquater,et al.  Integer Factorization Based on Elliptic Curve Method: Towards Better Exploitation of Reconfigurable Hardware , 2007 .

[16]  Kevin Skadron,et al.  A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..

[17]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.