K-Means for Parallel Architectures Using All-Prefix-Sum Sorting and Updating Steps

We present an implementation of parallel K-means clustering, called Kps-means, that achieves high performance with near-full occupancy compute kernels without imposing limits on the number of dimensions and data points permitted as input, thus combining flexibility with high degrees of parallelism and efficiency. As a key element to performance improvement, we introduce parallel sorting as data preprocessing and updating steps. Our final implementation for Nvidia GPUs achieves speedups of up to 200-fold over CPU reference code and of up to three orders of magnitude when compared with popular numerical software packages.

[1]  Michael Granitzer,et al.  Accelerating K-Means on the Graphics Processor via CUDA , 2009, 2009 First International Conference on Intensive Applications and Services.

[2]  Mark J. Harris,et al.  Parallel Prefix Sum (Scan) with CUDA , 2011 .

[3]  He Li,et al.  K-Means on Commodity GPUs with CUDA , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[4]  Meichun Hsu,et al.  Clustering billions of data points using GPUs , 2009, UCHPC-MAW '09.

[5]  Russ B. Altman,et al.  WebFEATURE: an interactive web tool for identifying and visualizing functional sites on macromolecular structures , 2003, Nucleic Acids Res..

[6]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[7]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[8]  Kevin Skadron,et al.  A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..

[9]  Russ B. Altman,et al.  CAMPAIGN: an open-source library of GPU-accelerated data clustering algorithms , 2011, Bioinform..

[10]  W. Daniel Hillis,et al.  Data parallel algorithms , 1986, CACM.

[11]  Russ B. Altman,et al.  Independent component analysis: Mining microarray data for fundamental human gene expression modules , 2010, J. Biomed. Informatics.

[12]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[13]  Manoranjan Dash,et al.  Efficient K-Means Clustering Using Accelerated Graphics Processors , 2008, DaWaK.

[14]  Roy H. Campbell,et al.  A Parallel Implementation of K-Means Clustering on GPUs , 2008, PDPTA.

[15]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.