TeaLeaf: A Mini-Application to Enable Design-Space Explorations for Iterative Sparse Linear Solvers

Iterative sparse linear solvers are an important class of algorithm in high performance computing, and form a crucial component of many scientific codes. As intra and inter node parallelism continues to increase rapidly, the design of new, scalable solvers which can target next generation architectures becomes increasingly important. In this work we present TeaLeaf, a recent mini-app constructed to explore design space choices for highly scalable solvers. We then use TeaLeaf to compare the standard CG algorithm with a Chebyshev Polynomially Preconditioned Conjugate Gradient (CPPCG) iterative sparse linear solver. CPPCG is a communication-avoiding algorithm, requiring less global communication than previous approaches. TeaLeaf includes support for many-core processors, such as GPUs and Xeon Phi, and we include strong-scaling results across a range of world-leading Petascale supercomputers, including Titan and Piz Daint.

[1]  D. O’Leary Yet another polynomial preconditioner for the conjugate gradient algorithm , 1991 .

[2]  V. E. Henson,et al.  BoomerAMG: a parallel algebraic multigrid solver and preconditioner , 2002 .

[3]  Matt Martineau,et al.  Pragmatic Performance Portability with OpenMP 4.x , 2016, IWOMP.

[4]  H. Carter Edwards,et al.  Kokkos: Enabling Performance Portability Across Manycore Architectures , 2013, 2013 Extreme Scaling Workshop (xsw 2013).

[5]  Yao Zhang,et al.  Fast tridiagonal solvers on the GPU , 2010, PPoPP '10.

[6]  Richard D. Hornung,et al.  The RAJA Portability Layer: Overview and Status , 2014 .

[7]  Tamara G. Kolda,et al.  An overview of the Trilinos project , 2005, TOMS.

[8]  Stephen A. Jarvis,et al.  Accelerating Hydrocodes with OpenACC, OpenCL and CUDA , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[9]  Thomas A. Manteuffel,et al.  A Comparison of Adaptive Chebyshev and Least Squares Polynomial Preconditioning for Hermitian Positive Definite Linear Systems , 1992, SIAM J. Sci. Comput..

[10]  B. Parlett The Symmetric Eigenvalue Problem , 1981 .

[11]  William Gropp,et al.  Efficient Management of Parallelism in Object-Oriented Numerical Software Libraries , 1997, SciTools.

[12]  Mark Hoemmen,et al.  Communication-avoiding Krylov subspace methods , 2010 .

[13]  Robert D. Falgout,et al.  hypre: A Library of High Performance Preconditioners , 2002, International Conference on Computational Science.

[14]  Matt Martineau,et al.  A Performance Evaluation of Kokkos & RAJA using the TeaLeaf Mini-App , 2015, SC 2015.

[15]  Cornelis Vuik,et al.  On the Construction of Deflation-Based Preconditioners , 2001, SIAM J. Sci. Comput..

[16]  R. Bowers,et al.  Numerical Modeling in Applied Physics and Astrophysics , 1991 .

[17]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[18]  Matt Martineau,et al.  2016 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS 2016) , 2017 .

[19]  Phillip A. Laplante Performance Analysis and Optimization , 2004 .

[20]  Tian Jin,et al.  Performance Analysis and Optimization of Clang's OpenMP 4.5 GPU Support , 2016, 2016 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS).

[21]  Matt Martineau,et al.  Evaluating OpenMP 4.0's Effectiveness as a Heterogeneous Parallel Programming Model , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[22]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[23]  Matt Martineau,et al.  An Evaluation of Emerging Many-Core Parallel Programming Models , 2016, PMAM@PPoPP.

[24]  T. G. Cowling,et al.  The mathematical theory of non-uniform gases : an account of the kinetic theory of viscosity, thermal conduction, and diffusion in gases , 1954 .

[25]  Edmond Chow,et al.  Fine-Grained Parallel Incomplete LU Factorization , 2015, SIAM J. Sci. Comput..

[26]  Sandia Report,et al.  Improving Performance via Mini-applications , 2009 .

[27]  N. Rashevsky,et al.  Mathematical biology , 1961, Connecticut medicine.