论文信息 - Multistage Mixed Precision Iterative Refinement

Multistage Mixed Precision Iterative Refinement

Low precision arithmetic, in particular half precision (16-bit) floating point arithmetic, is now available in commercial hardware. Using lower precision can offer significant savings in computation and communication costs with proportional savings in energy. Motivated by this, there has been a renewed interest in mixed precision iterative refinement schemes for solving linear systems Ax = b, and new variants of GMRES-based iterative refinement have been developed. Each particular variant with a given combination of precisions leads to different condition number-based constraints for convergence of the backward and forward errors, and each has different performance costs. The constraints for convergence given in the literature are, as an artifact of the analyses, often overly strict in practice, and thus could lead a user to select a more expensive variant when a less expensive one would have sufficed. In this work, we develop a multistage mixed precision iterative refinement solver which aims to combine existing mixed precision approaches to balance performance and accuracy and improve usability. For a user-specified initial combination of precisions, the algorithm begins with the least expensive approach and convergence is monitored via inexpensive computations with quantities produced during the iteration. If slow convergence or divergence is detected using particular stopping criteria, the algorithm switches to use a more expensive, but more reliable variant. A novel aspect of our approach is that, unlike existing implementations, our algorithm first attempts to use “stronger” GMRES-based solvers for the solution update before resorting to increasing the precision(s). In some scenarios, this can avoid the need to refactorize the matrix in higher precision. We perform extensive numerical experiments on a variety of random dense problems and problems from real applications which confirm the benefits of the multistage approach.

Erin Carson | Eda Oktay

[1] Takuya Ina,et al. Prompt Report on Exa-Scale HPL-AI Benchmark , 2020, 2020 IEEE International Conference on Cluster Computing (CLUSTER).

[2] Nicholas J. Higham,et al. Simulating Low Precision Floating-Point Arithmetic , 2019, SIAM J. Sci. Comput..

[3] Nicholas J. Higham,et al. Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[4] Nicholas J. Higham,et al. A New Analysis of Iterative Refinement and Its Application to Accurate Solution of Ill-Conditioned Sparse Linear Systems , 2017, SIAM J. Sci. Comput..

[5] J. Dongarra,et al. Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy (Revisiting Iterative Refinement for Linear Systems) , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[6] Nicholas J. Higham,et al. Squeezing a Matrix into Half Precision, with an Application to Solving Linear Systems , 2019, SIAM J. Sci. Comput..

[7] Jack J. Dongarra,et al. Towards dense linear algebra for hybrid GPU accelerated manycore systems , 2009, Parallel Comput..

[8] J. H. Wilkinson,et al. Solution of real and complex systems of linear equations , 1966 .

[9] Nicholas J. Higham,et al. Accelerating the Solution of Linear Systems by Iterative Refinement in Three Precisions , 2018, SIAM J. Sci. Comput..

[10] James Demmel,et al. Extra-Precise Iterative Refinement for Overdetermined Least Squares Problems , 2009, TOMS.

[11] James Demmel,et al. Error bounds from extra-precise iterative refinement , 2006, TOMS.