The Impact of Panel Factorization on the Gauss-Huard Algorithm for the Solution of Linear Systems on Modern Architectures

The Gauss-Huard algorithm (the GHA) is a specialized version of Gauss-Jordan elimination for the solution of linear systems that, enhanced with column pivoting, exhibits numerical stability and computational cost close to those of the conventional solver based on the LU factorization with row pivoting. Furthermore, the GHA can be formulated as a procedure rich in matrix multiplications, so that high performance can be expected on current architectures with multi-layered memories. Unfortunately, in principle the GHA does not admit the introduction of look-ahead, a technique that has been demonstrated to be rather useful to improve the performance of the LU factorization on multi-threaded platforms with high levels of hardware concurrency. In this paper we analyze the effect of this drawback on the implementation of the GHA on systems accelerated with graphics processing units (GPUs), exposing the roles of the CPU-to-GPU and single precision-to-double precision performance ratios, as well as the contribution from the operations in the algorithm’s critical path.

[1]  Robert A. van de Geijn,et al.  FLAME: Formal Linear Algebra Methods Environment , 2001, TOMS.

[2]  Enrique S. Quintana-Ortí,et al.  Exploiting the capabilities of modern GPUs for dense matrix computations , 2009 .

[3]  Robert A. van de Geijn,et al.  BLIS: A Framework for Rapidly Instantiating BLAS Functionality , 2015, ACM Trans. Math. Softw..

[4]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[5]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[6]  Enrique S. Quintana-Ortí,et al.  Exploiting the capabilities of modern GPUs for dense matrix computations , 2009, Concurr. Comput. Pract. Exp..

[7]  Enrique S. Quintana-Ortí,et al.  Solving Linear Systems on the Intel Xeon-Phi Accelerator via the Gauss-Huard Algorithm , 2015, CARLA.

[8]  T. J. Dekker,et al.  Stability of the Gauss-Huard algorithm with partial pivoting , 2007, Computing.

[9]  Julien Langou,et al.  Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems , 2007, Int. J. High Perform. Comput. Appl..

[10]  Tze Meng Low,et al.  The BLIS Framework , 2016 .

[11]  Enrique S. Quintana-Ortí,et al.  Revisiting the Gauss-Huard Algorithm for the Solution of Linear Systems on Graphics Accelerators , 2015, PPAM.

[12]  P. Strazdins A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization , 1998 .

[13]  T. J. Dekker,et al.  Parallel algorithms for solving large linear systems , 1994 .

[14]  Kitty Potma,et al.  Solving dense linear systems by Gauss-Huard's method on a distributed memory system , 1994, Future Gener. Comput. Syst..

[15]  Enrique S. Quintana-Ortí,et al.  Matrix inversion on CPU–GPU platforms with applications in control theory , 2013, Concurr. Comput. Pract. Exp..