Experimental Evaluation of Multiprecision Strategies for GMRES on GPUs

Support for lower precision computation is becoming more common in accelerator hardware due to lower power usage, reduced data movement and increased computational performance. However, computational science and engineering (CSE) problems require double precision accuracy in several domains. This conflict between hardware trends and application needs has resulted in a need for multiprecision strategies at the linear algebra algorithms level if we want to exploit the hardware to its full potential while meeting the accuracy requirements. In this paper, we focus on preconditioned sparse iterative linear solvers, a key kernel in several CSE applications. We present a study of multiprecision strategies for accelerating this kernel on GPUs. We seek the best methods for incorporating multiple precisions into the GMRES linear solver; these include iterative refinement and parallelizable preconditioners. Our work presents strategies to determine when multiprecision GMRES will be effective and to choose parameters for a multiprecision iterative refinement solver to achieve better performance. We use an implementation that is based on the Trilinos library and employs Kokkos Kernels for performance portability of linear algebra kernels. Performance results demonstrate the promise of multiprecision approaches and demonstrate even further improvements are possible by optimizing low-level kernels.

[1]  Piotr Luszczek,et al.  Improving the Performance of the GMRES Method using Mixed-Precision Techniques , 2020, SMC.

[2]  Valeria Simoncini,et al.  Theory of Inexact Krylov Subspace Methods and Applications to Scientific Computing , 2003, SIAM J. Sci. Comput..

[3]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[4]  W. Marsden I and J , 2012 .

[5]  Kathryn Turner,et al.  Efficient High Accuracy Solutions with GMRES(m) , 1992, SIAM J. Sci. Comput..

[6]  Hartwig Anzt,et al.  Mixed Precision Iterative Refinement Methods for Linear Systems: Convergence Analysis Based on Krylov Subspace Methods , 2010, PARA.

[7]  Nicholas J. Higham,et al.  A New Analysis of Iterative Refinement and Its Application to Accurate Solution of Ill-Conditioned Sparse Linear Systems , 2017, SIAM J. Sci. Comput..

[8]  Cleve B. Moler,et al.  Iterative Refinement in Floating Point , 1967, JACM.

[9]  Tamara G. Kolda,et al.  An overview of the Trilinos project , 2005, TOMS.

[10]  Nicholas J. Higham,et al.  A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic , 2020, ArXiv.

[11]  Nicholas J. Higham,et al.  Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems , 2020, Proceedings of the Royal Society A.

[12]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[13]  Nicholas J. Higham,et al.  Accelerating the Solution of Linear Systems by Iterative Refinement in Three Precisions , 2018, SIAM J. Sci. Comput..

[14]  Zhen Xiao,et al.  Mixed Precision in CUDA Polynomial Precondition for Iterative Solver , 2018, 2018 IEEE International Conference on Computer and Communication Engineering Technology (CCET).

[15]  Khalid Ahmad,et al.  Data-driven Mixed Precision Sparse Matrix Vector Multiplication for GPUs , 2019, ACM Trans. Archit. Code Optim..

[16]  Serge Gratton,et al.  Exploiting variable precision in GMRES , 2019, ArXiv.

[17]  Daniel Sunderland,et al.  Kokkos: Enabling manycore performance portability through polymorphic memory access patterns , 2014, J. Parallel Distributed Comput..

[18]  Julien Langou,et al.  Accelerating scientific computations with mixed precision algorithms , 2008, Comput. Phys. Commun..

[19]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[20]  Heidi K. Thornquist,et al.  Polynomial Preconditioned GMRES in Trilinos: Practical Considerations for High-Performance Computing , 2020, PPSC.

[21]  Nicholas J. Higham,et al.  Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[22]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[23]  Ronald B. Morgan,et al.  A Restarted GMRES Method Augmented with Eigenvectors , 1995, SIAM J. Matrix Anal. Appl..