GPU Accelerated Lanczos Algorithm with Applications

Graphics Processing Units provide a large computational power at a very low price which position them as an ubiquitous accelerator. GPGPU is accelerating general purpose computations using GPU's. GPU's have been used to accelerate many Linear Algebra routines and Numerical Methods. Lanczos is an iterative method well suited for finding the extreme eigenvalues and the corresponding eigenvectors of large sparse symmetric matrices. In this paper, we present an implementation of Lanczos Algorithm on GPU using the CUDA programming model and apply it to two important problems : graph bisection using spectral methods, and image segmentation. Our GPU implementation of spectral bisection performs better when compared to both an Intel Math Kernel Library implementation and a Matlab implementation. Our GPU implementation shows a speedup up to 97.3 times over Matlab Implementation and 2.89 times over the Intel Math Kernel Library implementation on a Intel Core i7 920 Processor, which is a quad-core CPU. Similarly, our image segmentation implementation achieves a speed up of 3.27 compared to a multicore CPU based implementation using Intel Math Kernel Library and OpenMP. Through this work, we therefore wish to establish that the GPU may still be a better platform for also highly irregular and computationally intensive applications.

[1]  P. J. Narayanan,et al.  CUDA cuts: Fast graph cuts on the GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[2]  Bruce Hendrickson,et al.  A Multi-Level Algorithm For Partitioning Graphs , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[3]  Michael I. Jordan,et al.  Link Analysis, Eigenvectors and Stability , 2001, IJCAI.

[4]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  P. J. Narayanan,et al.  Singular value decomposition on GPU using CUDA , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[6]  Alex Pothen,et al.  PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS* , 1990 .

[7]  James Demmel,et al.  Applied Numerical Linear Algebra , 1997 .

[8]  D. Spielman,et al.  Spectral partitioning works: planar graphs and finite element meshes , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[9]  Wenguang Chen,et al.  Parallelization of spectral clustering algorithm on multi-core processors and GPGPU , 2008, 2008 13th Asia-Pacific Computer Systems Architecture Conference.

[10]  Michael Garland,et al.  Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .

[11]  Thomas E. Potok,et al.  Parallel latent semantic analysis using a graphics processing unit , 2009, GECCO '09.

[12]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[13]  JOSEP DÍAZ,et al.  A survey of graph layout problems , 2002, CSUR.

[14]  Shang-Hua Teng,et al.  Spectral partitioning works: planar graphs and finite element meshes , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[15]  Jacob G. Martin,et al.  Spectral techniques for graph bisection in genetic algorithms , 2006, GECCO.

[16]  Mark A. Richards,et al.  QR decomposition on GPUs , 2009, GPGPU-2.