论文信息 - Minimizing communication in sparse matrix solvers - 字舞流文

Minimizing communication in sparse matrix solvers

Data communication within the memory system of a single processor node and between multiple nodes in a system is the bottleneck in many iterative sparse matrix solvers like CG and GMRES. Here k iterations of a conventional implementation perform k sparse-matrix-vector-multiplications and Ω(k) vector operations like dot products, resulting in communication that grows by a factor of Ω(k) in both the memory and network. By reorganizing the sparse-matrix kernel to compute a set of matrix-vector products at once and reorganizing the rest of the algorithm accordingly, we can perform k iterations by sending O(log P) messages instead of O(k · log P) messages on a parallel machine, and reading the matrix A from DRAM to cache just once, instead of k times on a sequential machine. This reduces communication to the minimum possible. We combine these techniques to form a new variant of GMRES. Our shared-memory implementation on an 8-core Intel Clovertown gets speedups of up to 4.3x over standard GMRES, without sacrificing convergence rate or numerical stability.

James Demmel | Katherine A. Yelick | Mark Hoemmen | Marghoob Mohiyuddin | J. Demmel | K. Yelick | M. Hoemmen | M. Mohiyuddin

[1] Y. Saad,et al. GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[2] J. Cullum,et al. A block Lanczos algorithm for computing the q algebraically largest eigenvalues and a corresponding eigenspace of large, sparse, real symmetric matrices , 1974, CDC 1974.

[3] Mark Hoemmen,et al. Communication-avoiding Krylov subspace methods , 2010 .

[4] Erik Elmroth,et al. Applying recursion to serial and parallel QR factorization leads to better performance , 2000, IBM J. Res. Dev..

[5] E. Sturler. A PARALLEL VARIANT OF GMRES(m) , 1991 .

[6] Sivan Toledo,et al. Quantitative performance modeling of scientific computations and creating locality in numerical algorithms , 1995 .

[7] Jocelyne Erhel,et al. A parallel GMRES version for general sparse matrices. , 1995 .

[8] James Demmel,et al. Avoiding communication in sparse matrix computations , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[9] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..

[10] Richard Vuduc,et al. Automatic performance tuning of sparse matrix kernels , 2003 .

[11] Anthony T. Chronopoulos,et al. s-step iterative methods for symmetric linear systems , 1989 .

[12] R Vichnevetsky,et al. IMACS '91: Proceedings of the IMACS World Congress on Computation and Applied Mathematics (13th) Held in Dublin, Ireland on July 22-26, 1991. Volume 2. Computational Fluid Dynamics and Wave Propagation, Parallel Computing, Concurrent and Supercomputing, Computational Physics/Computational Chemistry , 1991 .

[13] D. O’Leary. The block conjugate gradient algorithm and related methods , 1980 .

[14] Katherine Yelick,et al. Optimizing collective communication on multicores , 2009 .

[15] E. Jason Riedy,et al. Non-Negative Diagonals and High Performance on Low-Prole , 2008 .

[16] Ronald B. Morgan,et al. Implicitly Restarted GMRES and Arnoldi Methods for Nonsymmetric Systems of Equations , 2000, SIAM J. Matrix Anal. Appl..

[17] Vipin Kumar,et al. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[18] Eric de Sturler,et al. Recycling Krylov Subspaces for Sequences of Linear Systems , 2006, SIAM J. Sci. Comput..

[19] J. Demmel,et al. Avoiding Communication in Computing Krylov Subspaces , 2007 .

[20] L. Reichel,et al. A Newton basis GMRES implementation , 1994 .

[21] H. Walker. Implementation of the GMRES method using householder transformations , 1988 .

[22] W. Joubert,et al. Parallelizable restarted iterative methods for nonsymmetric linear systems. part I: Theory , 1992 .

[23] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.

[24] Richard W. Vuduc,et al. Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..