Parallel Shift-Invert Spectrum Slicing on Distributed Architectures with GPU Accelerators

The solution of large scale eigenvalue problems (EVP) is often the computational bottleneck for many scientific and engineering applications. Traditional eigensolvers, such as direct (e.g. ScaLAPACK) and Krylov subspace (e.g. Lanczos) methods, have struggled in achieving high scalability on large computing resources due to communication and synchronization bottlenecks which are inherent in their implementation. This includes a difficulty in developing well-performing ports of these algorithms to architectures which rely on the use of accelerators, such as graphics processing units (GPU), for the majority of their floating point operations. Recently, there has been significant research into the development of eigensolvers based on spectrum slicing, in particular shift-invert spectrum slicing, to alleviate the communication and synchronization bottlenecks of traditional eigensolvers. In general, spectrum slicing trades the global EVP for many smaller, independent EVPs which may be combined to assemble some desired subset of the entire eigenspectrum. The result is a method which utilizes more floating point operations than traditional eigensolvers, but in a way which allows for the expression of massive concurrency leading to an overall improvement in time-to-solution on large computing resources. In this work, we will examine the performance of parallel shift-invert spectrum slicing on modern GPU clusters using state-of-the-art linear algebra software.

[1]  A. George Nested Dissection of a Regular Finite Element Mesh , 1973 .

[2]  Mohamed-Ali Belabbas,et al.  Spectral methods in machine learning and new strategies for very large datasets , 2009, Proceedings of the National Academy of Sciences.

[3]  Jack J. Dongarra,et al.  SLATE: design of a modern distributed and accelerated linear algebra library , 2019, SC.

[4]  Mathias Jacquelin,et al.  Highly scalable distributed-memory sparse triangular solution algorithms , 2018, CSC.

[5]  Chao Yang,et al.  Solving Large-scale Eigenvalue Problems in SciDAC Applications , 2005 .

[6]  Hong Zhang,et al.  Shift‐and‐invert parallel spectral transformation eigensolver: Massively parallel performance for density‐functional based tight‐binding , 2016, J. Comput. Chem..

[7]  J. G. Lewis,et al.  A Shifted Block Lanczos Algorithm for Solving Sparse Symmetric Generalized Eigenproblems , 1994, SIAM J. Matrix Anal. Appl..

[8]  David E. Keyes,et al.  A High Performance QDWH-SVD Solver Using Hardware Accelerators , 2016, ACM Trans. Math. Softw..

[9]  Gerard L. G. Sleijpen,et al.  A Jacobi-Davidson Iteration Method for Linear Eigenvalue Problems , 1996, SIAM J. Matrix Anal. Appl..

[10]  David E. Keyes,et al.  Massively Parallel Polar Decomposition on Distributed-memory Systems , 2019, TOPC.

[11]  Ming Gu,et al.  A Robust and Efficient Implementation of LOBPCG , 2018, SIAM J. Sci. Comput..

[12]  Roland W. Freund,et al.  Computing Fundamental Matrix Decompositions Accurately via the Matrix Sign Function in Two Iterations: The Power of Zolotarev's Functions , 2016, SIAM Rev..

[13]  J. Demmel,et al.  Using the Matrix Sign Function to Compute Invariant Subspaces , 1998, SIAM J. Matrix Anal. Appl..

[14]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[15]  Gerard L. G. Sleijpen,et al.  A Jacobi-Davidson Iteration Method for Linear Eigenvalue Problems , 1996, SIAM Rev..

[16]  Andreas Marek,et al.  Optimizations of the Eigensolvers in the ELPA Library , 2018, Parallel Comput..

[17]  Chao Yang,et al.  ARPACK users' guide - solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods , 1998, Software, environments, tools.

[18]  Yousef Saad,et al.  The Eigenvalues Slicing Library (EVSL): Algorithms, Implementation, and Software , 2018, SIAM J. Sci. Comput..

[19]  Andrew V. Knyazev,et al.  Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method , 2001, SIAM J. Sci. Comput..

[20]  Julien Langou,et al.  Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems , 2007, Int. J. High Perform. Comput. Appl..

[21]  José E. Román,et al.  Strategies for spectrum slicing based on restarted Lanczos methods , 2012, Numerical Algorithms.

[22]  A Marek,et al.  The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science , 2014, Journal of physics. Condensed matter : an Institute of Physics journal.

[23]  Michael Sternberg,et al.  SIPs: Shift-and-invert parallel spectral transformations , 2007, TOMS.

[24]  Natalia Gimelshein,et al.  Effective Minimally-Invasive GPU Acceleration of Distributed Sparse Matrix Factorization , 2016, Euro-Par.

[25]  Yousef Saad,et al.  Fast Computation of Spectral Densities for Generalized Eigenvalue Problems , 2017, SIAM J. Sci. Comput..

[26]  Pascal Hénon,et al.  PaStiX: a high-performance parallel direct solver for sparse symmetric positive definite systems , 2002, Parallel Comput..

[27]  Nicholas J. Higham,et al.  Stable and Efficient Spectral Divide and Conquer Algorithms for the Symmetric Eigenvalue Decomposition and the SVD , 2013, SIAM J. Sci. Comput..

[28]  Pat Hanrahan,et al.  Understanding the efficiency of GPU algorithms for matrix-matrix multiplication , 2004, Graphics Hardware.

[29]  J. E. Román,et al.  SIESTA‐SIPs: Massively parallel spectrum‐slicing eigensolver for an ab initio molecular dynamics package , 2018, J. Comput. Chem..

[30]  James Demmel,et al.  SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems , 2003, TOMS.

[31]  Jack Dongarra,et al.  Templates for the Solution of Algebraic Eigenvalue Problems , 2000, Software, environments, tools.

[32]  Jack Dongarra,et al.  Generic Matrix Multiplication for Multi-GPU Accelerated Distributed-Memory Platforms over PaRSEC , 2019, 2019 IEEE/ACM 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA).

[33]  Yousef Saad,et al.  Approximating Spectral Densities of Large Matrices , 2013, SIAM Rev..

[34]  Olaf Schenk,et al.  Enhancing the scalability of selected inversion factorization algorithms in genomic prediction , 2017, J. Comput. Sci..