Parallelization of spectral clustering algorithm on multi-core processors and GPGPU

Spectral clustering is a widely-used algorithm in the field of information retrieval, data mining, machine learning and many others. It can help to cluster a large number of data into several categories without requiring any additional information about the dataset or the categories, so that people can find information by categories easily. In this paper, we parallelize the algorithm proposed by Andrew Y. Ng, Michael I. Jordan and Yair Weiss. We provide two versions of implementation: one is parallelized in OpenMP; the other is programmed in the NVIDIA CUDA (compute unified device architecture), which is the environment provided by NVIDIA to program on its CUDA-Enabled GPGPUs (general-purpose graphic processing unit). We can achieve about three times speedup in OpenMP and around ten times speedup using CUDA in our experiments.

[1]  Ilya Burylov Intel Performance Libraries MultiCoreReady Software for Numeric Intensive Computation , 2007 .

[2]  Yao Zhang,et al.  Scan primitives for GPU computing , 2007, GH '07.

[3]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[4]  Harold S. Park,et al.  Bridging Scale Method and Its Applications , 2007, CSE 2007.

[5]  Maurice Clint,et al.  A Highly Parallel Explicitly Restarted Lanczos Algorithm , 1996, PARA.

[6]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[7]  D. Sorensen IMPLICITLY RESTARTED ARNOLDI/LANCZOS METHODS FOR LARGE SCALE EIGENVALUE CALCULATIONS , 1996 .

[8]  D. C. Sorensen,et al.  A portable implementation of ARPACK for distributed memory parallel architectures , 1996 .

[9]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[10]  David Kirk,et al.  NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.

[11]  Manish Vachharajani,et al.  GPU acceleration of numerical weather prediction , 2008, IPDPS.

[12]  D. Calvetti,et al.  AN IMPLICITLY RESTARTED LANCZOS METHOD FOR LARGE SYMMETRIC EIGENVALUE PROBLEMS , 1994 .

[13]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[14]  S. Qian,et al.  A fast decomposition of banded symmetric Toeplitz matrices for parallel processing , 1999, ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349).

[15]  Weiguo Liu,et al.  Molecular Dynamics Simulations on Commodity GPUs with CUDA , 2007, HiPC.