Leveraging Data-Parallelism in ILUPACK using Graphics Processors

In this paper, we address the exploitation of data parallelism for the solution of sparse symmetric positive definite linear systems via iterative methods on Graphics Processing Units (GPUs). In particular, we accelerate the preconditioned CG-based iterative solver underlying the incomplete LU decomposition package (ILUPACK) by off-loading the most expensive computations i.e., The solution of sparse triangular systems and sparse matrix-vector products-to the hardware accelerator. The results collected using GPUs from the two most recent generations from NVIDIA ("Fermi" and "Kepler") and a benchmark test bed of sparse linear systems show that the GPU-enabled implementations deliver a notable reduction of the execution time, while maintaining the convergence rate and numerical properties of the original ILUPACK solver.

[1]  Wolfgang Straßer,et al.  A Parallel Preconditioned Conjugate Gradient Solver for the Poisson Problem on a Multi-GPU Platform , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[2]  Yousef Saad,et al.  GPU-accelerated preconditioned iterative linear solvers , 2013, The Journal of Supercomputing.

[3]  Eric J. Kelmelis,et al.  CULA: hybrid GPU accelerated linear algebra routines , 2010, Defense + Commercial Sensing.

[4]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[5]  Marcus J. Grote,et al.  Algebraic Multilevel Preconditioner for the Helmholtz Equation in Heterogeneous Media , 2009, SIAM J. Sci. Comput..

[6]  Enrique S. Quintana-Ortí,et al.  Parallelization of Multilevel ILU Preconditioners on Distributed-Memory Multiprocessors , 2010, PARA.

[7]  Yousef Saad,et al.  High performance manycore solvers for reservoir simulation , 2010 .

[8]  Greg Humphreys,et al.  A multigrid solver for boundary value problems using programmable graphics hardware , 2003, HWWS '03.

[9]  Yousef Saad,et al.  Multilevel Preconditioners Constructed From Inverse-Based ILUs , 2005, SIAM J. Sci. Comput..

[10]  Maxim Naumov,et al.  Incomplete-LU and Cholesky Preconditioned Iterative Methods Using CUSPARSE and CUBLAS , 2012 .

[11]  A. George Nested Dissection of a Regular Finite Element Mesh , 1973 .

[12]  Rüdiger Westermann,et al.  Numerical Simulations on PC Graphics Hardware , 2004, PVM/MPI.

[13]  Enrique S. Quintana-Ortí,et al.  Exploiting thread-level parallelism in the iterative solution of sparse linear systems , 2011, Parallel Comput..

[14]  Eitan Grinspun,et al.  Sparse matrix solvers on the GPU: conjugate gradients and multigrid , 2003, SIGGRAPH Courses.

[15]  Insung Ihm,et al.  SIMD Optimization of Linear Expressions for Programmable Graphics Hardware , 2004, Comput. Graph. Forum.

[16]  Olaf Schenk,et al.  Inertia-Revealing Preconditioning For Large-Scale Nonconvex Constrained Optimization , 2008, SIAM J. Sci. Comput..

[17]  Wen-mei W. Hwu,et al.  Programming Massively Parallel Processors, Third Edition: A Hands-on Approach , 2016 .

[18]  Martin Rumpf,et al.  Using Graphics Cards for Quantized FEM Computations , 2001, VIIP.

[19]  Cornelis Vuik,et al.  3D Bubbly Flow Simulation on the GPU - Iterative Solution of a Linear System Using Sub-domain and Level-Set Deflation , 2013, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[20]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[21]  Mickeal Verschoor,et al.  Analysis and performance estimation of the Conjugate Gradient method on multiple GPUs , 2012, Parallel Comput..

[22]  Santa Clara,et al.  Parallel Solution of Sparse Triangular Linear Systems in the Preconditioned Iterative Methods on the GPU , 2011 .

[23]  Guillaume Caumon,et al.  Concurrent Number Cruncher: An Efficient Sparse Linear Solver on the GPU , 2007, HPCC.