Benefits from using mixed precision computations in the ELPA-AEO and ESSEX-II eigensolver projects

We first briefly report on the status and recent achievements of the ELPA-AEO (Eigen value Solvers for Petaflop Applications—Algorithmic Extensions and Optimizations) and ESSEX II (Equipping Sparse Solvers for Exascale) projects. In both collaboratory efforts, scientists from the application areas, mathematicians, and computer scientists work together to develop and make available efficient highly parallel methods for the solution of eigenvalue problems. Then we focus on a topic addressed in both projects, the use of mixed precision computations to enhance efficiency. We give a more detailed description of our approaches for benefiting from either lower or higher precision in three selected contexts and of the results thus obtained.

[1]  Bruno Lang,et al.  Cannon-type triangular matrix multiplication for the reduction of generalized HPD eigenproblems to standard form , 2020, Parallel Comput..

[2]  Y. Saad Numerical Methods for Large Eigenvalue Problems , 2011 .

[3]  Gerhard Wellein,et al.  A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units , 2013, SIAM J. Sci. Comput..

[4]  G. W. Stewart Block Gram--Schmidt Orthogonalization , 2008, SIAM J. Sci. Comput..

[5]  Matthias Scheffler,et al.  Efficient O(N) integration for all-electron electronic structure calculation using numeric basis functions , 2009, J. Comput. Phys..

[6]  T. Sakurai,et al.  CIRR: a Rayleigh-Ritz type method with contour integral for generalized eigenvalue problems , 2007 .

[7]  Lukas Krämer,et al.  Improving projection‐based eigensolvers via adaptive techniques , 2018, Numer. Linear Algebra Appl..

[8]  A Marek,et al.  The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science , 2014, Journal of physics. Condensed matter : an Institute of Physics journal.

[9]  Carlos G. Levi,et al.  Ferroelastic switching of doped zirconia: Modeling and understanding from first principles , 2014 .

[10]  Mathias Jacquelin,et al.  ELSI: A unified software interface for Kohn-Sham electronic structure solvers , 2017, Comput. Phys. Commun..

[11]  Yusaku Yamamoto,et al.  Roundoff error analysis of the Cholesky QR2 algorithm , 2015 .

[12]  Eric Polizzi,et al.  Krylov eigenvalue strategy using the FEAST algorithm with inexact system solves , 2017, Numer. Linear Algebra Appl..

[13]  Gerhard Wellein,et al.  GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems , 2015, International Journal of Parallel Programming.

[14]  Lukas Krämer,et al.  Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations , 2011, Parallel Comput..

[15]  Mark Hoemmen,et al.  Communication-avoiding Krylov subspace methods , 2010 .

[16]  Gerhard Wellein,et al.  Performance Engineering and Energy Efficiency of Building Blocks for Large, Sparse Eigenvalue Computations on Heterogeneous Supercomputers , 2016, Software for Exascale Computing.

[17]  T. Sakurai,et al.  A projection method for generalized eigenvalue problems using numerical integration , 2003 .

[18]  Rampi Ramprasad,et al.  Ab Initio Green-Kubo Approach for the Thermal Conductivity of Solids. , 2016, Physical review letters.

[19]  Julien Langou,et al.  Accelerating scientific computations with mixed precision algorithms , 2008, Comput. Phys. Commun..

[20]  Lukas Krämer,et al.  ESSEX: Equipping Sparse Solvers for Exascale , 2014, Euro-Par Workshops.

[21]  Gerhard Wellein,et al.  High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations , 2015, J. Comput. Phys..

[22]  Matthias Scheffler,et al.  Thermodynamic equilibrium conditions of graphene films on SiC. , 2013, Physical review letters.

[23]  Jack J. Dongarra,et al.  Mixed-Precision Cholesky QR Factorization and Its Case Studies on Multicore CPU with Multiple GPUs , 2015, SIAM J. Sci. Comput..

[24]  Eric Polizzi,et al.  Krylov eigenvalue strategy using the FEAST algorithm with inexact system solves: Krylov strategy using FEAST with inexact system solves , 2018 .

[25]  Fred Wubs,et al.  Numerical bifurcation analysis of a 3D turing-type reaction-diffusion model , 2018, Commun. Nonlinear Sci. Numer. Simul..

[26]  Tetsuya Sakurai,et al.  Equipping Sparse Solvers for Exascale , 2018 .

[27]  Jack J. Dongarra,et al.  Mixed-Precision Orthogonalization Scheme and Adaptive Step Size for Improving the Stability and Performance of CA-GMRES on GPUs , 2014, VECPAR.

[28]  Jean-Michel Muller,et al.  Handbook of Floating-Point Arithmetic (2nd Ed.) , 2018 .

[29]  Kesheng Wu,et al.  A Block Orthogonalization Procedure with Constant Synchronization Requirements , 2000, SIAM J. Sci. Comput..

[30]  Bruno Lang Efficient Reduction of Banded Hermitian Positive Definite Generalized Eigenvalue Problems to Banded Standard Eigenvalue Problems , 2019, SIAM J. Sci. Comput..

[31]  Eric Polizzi,et al.  A Density Matrix-based Algorithm for Solving Eigenvalue Problems , 2009, ArXiv.

[32]  Gerhard Wellein,et al.  CRAFT: A Library for Easier Application-Level Checkpoint/Restart and Automatic Fault Tolerance , 2017, IEEE Transactions on Parallel and Distributed Systems.

[33]  P. Schönemann,et al.  A generalized solution of the orthogonal procrustes problem , 1966 .

[34]  Matthias Scheffler,et al.  Ab initio molecular simulations with numeric atom-centered orbitals , 2009, Comput. Phys. Commun..

[35]  Gerhard Wellein,et al.  Increasing the Performance of the Jacobi-Davidson Method by Blocking , 2015, SIAM J. Sci. Comput..

[36]  Gerhard Wellein,et al.  Towards an Exascale Enabled Sparse Solver Repository , 2016, Software for Exascale Computing.

[37]  Matthias Krack,et al.  Efficient and accurate Car-Parrinello-like approach to Born-Oppenheimer molecular dynamics. , 2007, Physical review letters.

[38]  D. Sorensen Numerical methods for large eigenvalue problems , 2002, Acta Numerica.

[39]  D. G. Clayton,et al.  Gram‐Schmidt Orthogonalization , 1971 .

[40]  James Demmel,et al.  Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..

[41]  Pieter Ghysels,et al.  A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization , 2015, ACM Trans. Math. Softw..

[42]  Lynn Elliot Cannon,et al.  A cellular computer to implement the kalman filter algorithm , 1969 .