A direct tridiagonal solver based on Givens rotations for GPU architectures

A parallel solver for general tridiagonal irreducible systems is described.Solver based on Spike framework and Givens-QR with occasional low-rank modification.Modifications handle singularities exposed by QR in blocks of the parallel partition.The GPU implementation has similar performance to existing methods.Method returns accurate results when current GPU tridiagonal solvers fail. g-Spike, a parallel algorithm for solving general nonsymmetric tridiagonal systems for the GPU, and its CUDA implementation are described. The solver is based on the Spike framework, applying Givens rotations and QR factorization without pivoting. It also implements a low-rank modification strategy to compute the Spike DS decomposition even when the partitioning defines singular submatrices along the diagonal. The method is also used to solve the reduced system resulting from the Spike partitioning. Numerical experiments with problems of high order indicate that g-Spike is competitive in runtime with existing GPU methods, and can provide acceptable results when other methods cannot be applied or fail.

[1]  Peter Arbenz,et al.  A survey of direct parallel algorithms for banded linear systems , 1994 .

[2]  James Demmel,et al.  On computing givens rotations reliably and efficiently , 2002, TOMS.

[3]  Di Zhao,et al.  Efficiently solving tri-diagonal system by chunked cyclic reduction and single-GPU shared memory , 2015, The Journal of Supercomputing.

[4]  P. Swarztrauber A parallel algorithm for solving general tridiagonal equations , 1979 .

[5]  Jack Dongarra,et al.  A comparison of parallel solvers for diagonally dominant and general narrow-banded linear systems , 2001 .

[6]  David J. Kuck,et al.  On Stable Parallel Linear System Solvers , 1978, JACM.

[7]  Gene H. Golub,et al.  Cyclic Reduction - History and Applications , 1997 .

[8]  Jack Dongarra,et al.  ScaLAPACK Users' Guide , 1987 .

[9]  Roummel F. Marcia,et al.  A backward stability analysis of diagonal pivoting methods for solving unsymmetric tridiagonal systems without interchanges , 2011, Numer. Linear Algebra Appl..

[10]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[11]  Hee-Seok Kim,et al.  A scalable, numerically stable, high-performance tridiagonal solver using GPUs , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Ahmed Sameh,et al.  SPIKE: A parallel environment for solving banded linear systems , 2007 .

[13]  Robert Strzodka,et al.  Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid , 2011, IEEE Transactions on Parallel and Distributed Systems.

[14]  Juan López,et al.  Unified Architecture for Divide and Conquer Based Tridiagonal System Solvers , 1994, IEEE Trans. Computers.

[15]  Jack Dongarra,et al.  Implementation in ScaLAPACK of Divide-and-Conquer Algorithms forBanded and Tridiagonal Linear Systems , 1997 .

[16]  Robert J. Brunner,et al.  High-Performance Computing with Accelerators , 2010, Comput. Sci. Eng..

[17]  Brian J. Murphy,et al.  Solving tridiagonal systems on a GPU , 2013, 20th Annual International Conference on High Performance Computing.

[18]  Li-Wen Chang,et al.  Scalable Parallel Tridiagonal Algorithms with Diagonal Pivoting and Their Optimization for Many-Core Architectures , 2014 .

[19]  Alan J. Laub,et al.  A recursive doubling algorithm for solution of tridiagonal systems on hypercube multiprocessors , 1989 .

[20]  Peter Arbenz,et al.  Direct Methods for Banded Linear Systems on Massively Parallel Processor Computers , 1995, PPSC.

[21]  Gene H. Golub,et al.  Methods for modifying matrix factorizations , 1972, Milestones in Matrix Computation.

[22]  P. Arbenz,et al.  The stable parallel solution of general narrow banded linear systems , 1996 .

[23]  Markus Hegland,et al.  The Stable Parallel Solution of Narrow Banded Linear Systems , 1997, PPSC.

[24]  Murat Manguoglu,et al.  Performance models for the Spike banded linear system solver , 2011 .

[25]  Francisco Argüello,et al.  Memory Hierarchy Optimization for Large Tridiagonal System Solvers on GPU , 2012, 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications.

[26]  Roummel F. Marcia,et al.  Generalized Diagonal Pivoting Methods for Tridiagonal Systems without Interchanges , 2010 .

[27]  Hee-Seok Kim,et al.  A Scalable Tridiagonal Solver for GPUs , 2011, 2011 International Conference on Parallel Processing.

[28]  Yao Zhang,et al.  An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[29]  John M. Conroy Parallel algorithms for the solution of narrow banded systems , 1989 .

[30]  Wen-mei W. Hwu,et al.  What is ahead for parallel computing , 2014, J. Parallel Distributed Comput..

[31]  Eric Polizzi,et al.  A threaded SPIKE algorithm for solving general banded systems , 2011, Parallel Comput..

[32]  Wen-mei W. Hwu,et al.  A Guide for Implementing Tridiagonal Solvers on GPUs , 2014, Numerical Computations with GPUs.

[33]  Henk Sips,et al.  Euro-Par 2009 Parallel Processing, 15th International Euro-Par Conference, Delft, The Netherlands, August 25-28, 2009. Proceedings , 2009, Euro-Par.

[34]  Murat Manguoglu,et al.  PSPIKE: A Parallel Hybrid Sparse Linear System Solver , 2009, Euro-Par.

[35]  Wen-mei W. Hwu,et al.  Data Layout Transformation Exploiting Memory-Level Parallelism in Structured Grid Many-Core Applications , 2010, International Journal of Parallel Programming.

[36]  Mauro Leoncini,et al.  Checking robust nonsingularity of tridiagonal matrices in linear time , 1996 .

[37]  Yao Zhang,et al.  Fast tridiagonal solvers on the GPU , 2010, PPoPP '10.

[38]  Mark A. Richards,et al.  QR decomposition on GPUs , 2009, GPGPU-2.

[39]  Efstratios Gallopoulos Processor arrays for problems in computational physics (parallel) , 1985 .

[40]  Gareth I. Hargreaves,et al.  Computing the Condition Number of Tridiagonal and Diagonal-Plus-Semiseparable Matrices in Linear Time , 2005, SIAM J. Matrix Anal. Appl..

[41]  Stephen J. Wright,et al.  Parallel Algorithms for Banded Linear Systems , 1991, SIAM J. Sci. Comput..

[42]  Ilan Bar-On Checking nonsingularity of tridiagonal matrices , 1999 .

[43]  David Padua,et al.  Encyclopedia of Parallel Computing , 2011 .

[44]  Ananth Grama,et al.  Parallel Numerical Computing from Illiac IV to Exascale - The Contributions of Ahmed H. Sameh , 2012, High-Performance Scientific Computing.

[45]  Youngmin Kim,et al.  Accelerating MATLAB with GPU Computing: A Primer with Examples , 2013 .

[46]  Ahmed H. Sameh,et al.  A parallel hybrid banded system solver: the SPIKE algorithm , 2006, Parallel Comput..

[47]  Jack J. Dongarra,et al.  Solving banded systems on a parallel processor , 1987, Parallel Comput..