High Performance Computing in Science and Engineering

Stencil calculations and matrix-free Krylov subspace solvers represent important components of many scientific computing applications. In these solvers, stencil applications are often the dominant part of the computation; an efficient parallel implementation of the kernel is therefore crucial to reduce the time to solution. Inspired by polynomial preconditioning, we remove upper bounds on the arithmetic intensity of the Krylov subspace building block by replacing the matrix with a higher-degree matrix polynomial. Using the latest state-of-the-art stencil compiler programs with temporal blocking, reduced memory bandwidth usage and, consequently, better utilization of SIMD vectorization and thus speedup on modern hardware, we are able to obtain performance improvements for higher polynomial degrees than simpler cache-blocking approaches have yielded in the past, demonstrating the new appeal of polynomial techniques on emerging architectures. We present results in a shared-memory environment and an extension to a distributed-memory environment with local shared memory.

[1]  C. Lanczos An iteration method for the solution of the eigenvalue problem of linear differential and integral operators , 1950 .

[2]  Z. Strakos,et al.  Krylov Subspace Methods: Principles and Analysis , 2012 .

[3]  M. Saunders,et al.  Solution of Sparse Indefinite Systems of Linear Equations , 1975 .

[4]  Michael A. Saunders,et al.  CG Versus MINRES: An Empirical Comparison , 2012 .

[5]  C. Paige Error Analysis of the Lanczos Algorithm for Tridiagonalizing a Symmetric Matrix , 1976 .

[6]  H. Simon Analysis of the symmetric Lanczos algorithm with reorthogonalization methods , 1984 .

[7]  Anne Greenbaum,et al.  Predicting the Behavior of Finite Precision Lanczos and Conjugate Gradient Computations , 2015, SIAM J. Matrix Anal. Appl..

[8]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[9]  G. Meurant,et al.  The Lanczos and conjugate gradient algorithms in finite precision arithmetic , 2006, Acta Numerica.

[10]  Per Christian Hansen,et al.  Rank-Deficient and Discrete Ill-Posed Problems , 1996 .

[11]  Iain S. Duff,et al.  Users' guide for the Harwell-Boeing sparse matrix collection (Release 1) , 1992 .

[12]  S. Godunov,et al.  Condition number of the Krylov bases and subspaces , 1996 .

[13]  Sergey V. Kuznetsov Perturbation bounds of the krylov bases and associated hessenberg forms , 1997 .

[14]  Christopher C. Paige,et al.  An Augmented Stability Result for the Lanczos Hermitian Matrix Tridiagonalization Process , 2010, SIAM J. Matrix Anal. Appl..

[15]  A. Greenbaum Behavior of slightly perturbed Lanczos and conjugate-gradient recurrences , 1989 .

[16]  Anne Greenbaum,et al.  Relations between Galerkin and Norm-Minimizing Iterative Methods for Solving Linear Systems , 1996, SIAM J. Matrix Anal. Appl..

[17]  Paul Van Dooren,et al.  Sensitivity analysis of the Lanczos reduction , 1999 .

[18]  C. Paige Accuracy and effectiveness of the Lanczos algorithm for the symmetric eigenproblem , 1980 .

[19]  Petr Tichý,et al.  On sensitivity of Gauss–Christoffel quadrature , 2007, Numerische Mathematik.

[20]  Christopher C. Paige,et al.  The computation of eigenvalues and eigenvectors of very large sparse matrices , 1971 .

[21]  B. Parlett,et al.  The Lanczos algorithm with selective orthogonalization , 1979 .