Parallelizing Alternating Direction Implicit Solver on GPUs

We present a parallel Alternating Direction Implicit (ADI) solver on GPUs. Our implementation significantly improves existing implementations in two aspects. First, we address the scalability issue of existing Parallel Cyclic Reduction (PCR) implementations by eliminating their hardware resource constraints. As a result, our parallel ADI, which is based on PCR, no longer has the maximum domain size limitation. Second, we optimize inefficient data accesses of parallel ADI solver by leveraging hardware texture memory and matrix transpose techniques. These memory optimizations further make already parallelized ADI solver twice faster, achieving overall more than 100 times speedup over a highly optimized CPU version. We also present the analysis of numerical accuracy of the proposed parallel ADI solver.

[1]  John D. Owens,et al.  Register packing for cyclic reduction: a case study , 2011, GPGPU-4.

[2]  Yao Zhang,et al.  Fast tridiagonal solvers on the GPU , 2010, PPoPP '10.

[3]  Eugene L. Wachspress,et al.  Alternating direction implicit iteration for systems with complex spectra , 1991 .

[4]  John Killeen,et al.  Alternating direction implicit techniques for two-dimensional magnetohydrodynamic calculations , 1973 .

[5]  Robert Strzodka,et al.  Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid , 2011, IEEE Transactions on Parallel and Distributed Systems.

[6]  Yao Zhang,et al.  A Hybrid Method for Solving Tridiagonal Systems on the GPU , 2012 .

[7]  Yao Zhang,et al.  An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[8]  Weiming Wu Computational River Dynamics , 2007 .

[9]  Roger W. Hockney,et al.  A Fast Direct Solution of Poisson's Equation Using Fourier Analysis , 1965, JACM.

[10]  Yaoxin Zhang,et al.  Parallelized CCHE2D flow model with CUDA Fortran on Graphics Processing Units , 2013 .

[11]  J. J. Douglas On the Numerical Integration of $\frac{\partial ^2 u}{\partial x^2 } + \frac{\partial ^2 u}{\partial y^2 } = \frac{\partial u}{\partial t}$ by Implicit Methods , 1955 .

[12]  H. H. Rachford,et al.  The Numerical Solution of Parabolic and Elliptic Differential Equations , 1955 .

[13]  Jie Cheng,et al.  CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..

[14]  T. Namiki,et al.  A new FDTD algorithm based on alternating-direction implicit method , 1999 .

[15]  Harold S. Stone,et al.  An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations , 1973, JACM.

[16]  R. T. Cheng,et al.  SEMI-IMPLICIT FINITE DIFFERENCE METHODS FOR THREE-DIMENSIONAL SHALLOW WATER FLOW , 1992 .

[17]  Hee-Seok Kim,et al.  A Scalable Tridiagonal Solver for GPUs , 2011, 2011 International Conference on Parallel Processing.

[18]  David R. Kaeli,et al.  Multi GPU implementation of iterative tomographic reconstruction algorithms , 2009, 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[19]  Yaoxin Zhang,et al.  Parallelization of Implicit CCHE2D Model using CUDA Programming Techniques , 2013 .