Communication-Avoiding Krylov Techniques on Graphic Processing Units

Communicating data within the graphic processing unit (GPU) memory system and between the CPU and GPU are major bottlenecks in accelerating Krylov solvers on GPUs. Communication-avoiding techniques reduce the communication cost of Krylov subspace methods by computing several vectors of a Krylov subspace “at once,” using a kernel called “matrix powers.” The matrix powers kernel is implemented on a recent generation of NVIDIA GPUs and speedups of up to 5.7 times are reported for the communication-avoiding matrix powers kernel compared to the standards prase matrix vector multiplication (SpMV) implementation.

[1]  Mark Hoemmen,et al.  Communication-avoiding Krylov subspace methods , 2010 .

[2]  Joseph L. Greathouse,et al.  Efficient Sparse Matrix-Vector Multiplication on GPUs Using the CSR Storage Format , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[3]  John Van Rosendale Minimizing Inner Product Data Dependencies in Conjugate Gradient Iteration , 1983, ICPP.

[4]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[5]  H. Walker Implementation of the GMRES method using householder transformations , 1988 .

[6]  W. Joubert,et al.  Parallelizable restarted iterative methods for nonsymmetric linear systems. part I: Theory , 1992 .

[7]  Anthony T. Chronopoulos,et al.  s-step iterative methods for symmetric linear systems , 1989 .

[8]  Maryam Mehri Dehnavi,et al.  Finite-Element Sparse Matrix Vector Multiplication on Graphic Processing Units , 2010, IEEE Transactions on Magnetics.

[9]  Michael Garland,et al.  Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .

[10]  James Demmel,et al.  Minimizing communication in sparse matrix solvers , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[11]  Maryam Mehri Dehnavi,et al.  Parallel Sparse Approximate Inverse Preconditioning on Graphic Processing Units , 2013, IEEE Transactions on Parallel and Distributed Systems.

[12]  D. Hut A Newton Basis Gmres Implementation , 1991 .

[13]  Graham F. Carey,et al.  Parallelizable Restarted Iterative Methods for Nonsymmetric Linear Systems , 1991, PPSC.

[14]  James Demmel,et al.  Avoiding Communication in Two-Sided Krylov Subspace Methods , 2011 .

[15]  H. Walker,et al.  Note on a Householder implementation of the GMRES method , 1986 .

[16]  Rob H. Bisseling,et al.  Parallel hypergraph partitioning for scientific computing , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[17]  James Demmel,et al.  A Residual Replacement Strategy for Improving the Maximum Attainable Accuracy of s-Step Krylov Subspace Methods , 2014, SIAM J. Matrix Anal. Appl..

[18]  J. Demmel,et al.  Avoiding Communication in Computing Krylov Subspaces , 2007 .