HLanc: Heterogeneous Parallel Implementation of the Implicitly Restarted Lanczos Method

Graphics Processing Unit (GPU) has been used as a ubiquitous accelerator for general purpose computing, such as linear algebra routines and numerical methods. The implicitly restarted Lanczos method (IRLM) is well suited for solving the partial eigenvalue problem for large symmetric sparse matrices, which is important in many real world applications. In this paper, we present the HLanc library, a parallel implementation of IRLM on the heterogeneous CPU-GPU architecture employing the CUDA programming model. The HLanc library is designed with separated heterogeneous parallel IRLM solvers and sparse matrix-vector multiplication (SPMV) operators. The SPMV operators hide the details about the storage of sparse matrices from the IRLM solvers, so the solvers can work with any spare matrix formats. Especially the SPMV operators and IRLM solvers can be combined arbitrarily for achieving the best performance of CPU-GPU heterogeneous system. The HLanc is evaluated using eight sparse matrices with the NVIDIA GTX 480 and GTX TITAN Black GPUs. The results show that HLanc achieves 15 times speedup than the ARPACK library and scales well across different GPU generations.

[1]  J. G. F. Francis,et al.  The QR Transformation A Unitary Analogue to the LR Transformation - Part 1 , 1961, Comput. J..

[2]  Christopher C. Paige,et al.  The computation of eigenvalues and eigenvectors of very large sparse matrices , 1971 .

[3]  Jack Dongarra,et al.  ScaLAPACK Users' Guide , 1987 .

[4]  Danny C. Sorensen,et al.  Implicit Application of Polynomial Filters in a k-Step Arnoldi Method , 1992, SIAM J. Matrix Anal. Appl..

[5]  D. Calvetti,et al.  AN IMPLICITLY RESTARTED LANCZOS METHOD FOR LARGE SYMMETRIC EIGENVALUE PROBLEMS , 1994 .

[6]  Y. PÐ1Þ,et al.  PARPACK: An Efficient Portable Large Scale Eigenvalue Package for Distributed Memory Parallel Architectures , 1996 .

[7]  D. Sorensen IMPLICITLY RESTARTED ARNOLDI/LANCZOS METHODS FOR LARGE SCALE EIGENVALUE CALCULATIONS , 1996 .

[8]  Danny C. Sorensen,et al.  P_ARPACK: An Efficient Portable Large Scale Eigenvalue Package for Distributed Memory Parallel Architectures , 1996, PARA.

[9]  P. Alpatov,et al.  PLAPACK Parallel Linear Algebra Package Design Overview , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[10]  Chao Yang,et al.  ARPACK users' guide - solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods , 1998, Software, environments, tools.

[11]  H. Simon,et al.  A parallel Lanczos method for symmetric generalized eigenvalue problems , 1999 .

[12]  Jack Dongarra,et al.  Templates for the Solution of Algebraic Eigenvalue Problems , 2000, Software, environments, tools.

[13]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[14]  Mario Rosario Guarracino,et al.  A parallel block Lanczos algorithm and its implementation for the evaluation of some eigenvalues of large sparse symmetric matrices on multicomputers , 2006 .

[15]  Yao Zhang,et al.  Parallel Computing Experiences with CUDA , 2008, IEEE Micro.

[16]  Guizhi Chen,et al.  A new shift scheme for the harmonic Arnoldi method , 2008, Math. Comput. Model..

[17]  Christos Faloutsos,et al.  Spectral Analysis for Billion-Scale Graphs: Discoveries and Implementation , 2011, PAKDD.

[18]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[19]  Kiran Kumar Matam,et al.  GPU Accelerated Lanczos Algorithm with Applications , 2011, 2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications.

[20]  Harish Malla,et al.  Block Lanczos to Solve Integer Factorization Problem Using GPU’s , 2012 .