Parallel, multigrain iterative solvers for hiding network latencies on MPPs and networks of clusters

Parallel iterative solvers are often the only means of solving large linear systems and eigenproblems. However, these solvers are usually implemented in a fine-grain manner and can incur significant performance penalties due to synchronization overheads on large MPPs. This problem is exacerbated in clusters of workstations (COWs) and SMPs that are interconnected via a hierarchy of networks. In this paper, we describe a novel scheme for hiding the synchronization overheads, and thus improving scalability, of block iterative solvers that employ a correction equation through an inner iterative method.Block methods are not only robust in the presence of eigenvalue multiplicities and multiple right-hand sides, but provide better latency tolerance by performing more floating-point operations between synchronizations. We take a different approach to inducing latency tolerance by increasing the granularity at which the correction equation is solved for each block vector. This is accomplished by splitting the processors into smaller subgroups which are then used to solve the correction for each block vector concurrently. The rest of the algorithm is still performed in fine grain. We call this combination of fine and coarse-grain parallelism multigrain parallelism.We implemented a multigrain, block Jacobi-Davidson algorithm for computing the extreme eigenvalues of a symmetric matrix. We obtained improvements of 45-50% over both the block and non-block implementations of the fine-grain method when testing on an IBM SP and on a collection of clusters of Sun workstations.

[1]  Anthony T. Chronopoulos s-Step Iterative Methods for (Non) Symmetric (In) Definite Linear Systems , 1989, PPSC.

[2]  Cevdet Aykanat,et al.  Vectorization and parallelization of the conjugate gradient algorithm on hypercube-connected vector processors , 1990 .

[3]  Gerard L. G. Sleijpen,et al.  Alternative correction equations in the Jacobi-Davidson method , 1999, Numer. Linear Algebra Appl..

[4]  Gerard L. G. Sleijpen,et al.  Jacobi-Davidson Style QR and QZ Algorithms for the Reduction of Matrix Pencils , 1998, SIAM J. Sci. Comput..

[5]  H. V. D. Vorst,et al.  Reducing the effect of global communication in GMRES( m ) and CG on parallel distributed memory computers , 1995 .

[6]  A. Stathopoulos,et al.  Reducing synchronization on the parallel Davidson method for the large sparse, eigenvalue problem , 1993, Supercomputing '93.

[7]  H. V. D. Vorst,et al.  Jacobi-davidson type methods for generalized eigenproblems and polynomial eigenproblems , 1995 .

[8]  Richard Tran Mills,et al.  Dynamic load balancing of an iterative eigensolver on networks of heterogeneous clusters , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[9]  Gerard L. G. Sleijpen,et al.  A generalized Jacobi-Davidson iteration method for linear eigenvalue problems , 1998 .

[10]  Andreas Stathopoulos,et al.  A Parallel, Block, Jacobi-Davidson Implementation for Solving Large Eigenproblems on Coarse Grain Environment , 1999, PDPTA.

[11]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[12]  H. V. D. Vorst,et al.  Jacobi-Davidson style QR and QZ algorithms for the partial reduction of matrix pencils , 1996 .

[13]  Yousef Saad,et al.  A Flexible Inner-Outer Preconditioned GMRES Algorithm , 1993, SIAM J. Sci. Comput..

[14]  J. Cullum,et al.  Lanczos algorithms for large symmetric eigenvalue computations , 1985 .

[15]  Andreas Stathopoulos,et al.  Multigrain parallelism for eigenvalue computations on networks of clusters , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[16]  Jack Dongarra,et al.  Templates for the Solution of Algebraic Eigenvalue Problems , 2000, Software, environments, tools.

[17]  D. O’Leary The block conjugate gradient algorithm and related methods , 1980 .

[18]  L G SleijpenGerard,et al.  A Jacobi--Davidson Iteration Method for Linear Eigenvalue Problems , 1996 .

[19]  Gerard L. G. Sleijpen,et al.  A Jacobi-Davidson Iteration Method for Linear Eigenvalue Problems , 1996, SIAM J. Matrix Anal. Appl..

[20]  B. Parlett The Symmetric Eigenvalue Problem , 1981 .

[21]  Vipin Kumar,et al.  A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering , 1998, J. Parallel Distributed Comput..

[22]  Luca Bergamaschi,et al.  Parallel preconditioning of a sparse eigensolver , 2001, Parallel Comput..

[23]  Edmond K. C. Chow ParaSails: Parallel sparse approximate inverse (least-squares) preconditioner , 2001 .

[24]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[25]  V. Simoncini,et al.  A hybrid block GMRES method for nonsymmetric systems with multiple right-hand sides , 1996 .

[26]  Jack Dongarra,et al.  Numerical Linear Algebra for High-Performance Computers , 1998 .

[27]  David A. Patterson,et al.  Computer architecture (2nd ed.): a quantitative approach , 1996 .

[28]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[29]  Gerard L. G. Sleijpen,et al.  A Jacobi-Davidson Iteration Method for Linear Eigenvalue Problems , 1996, SIAM Rev..

[30]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[31]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[32]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[33]  Youcef Saad,et al.  A Basic Tool Kit for Sparse Matrix Computations , 1990 .

[34]  Dianne P. O'Leary,et al.  Parallel implementation of the block conjugate gradient algorithm , 1987, Parallel Comput..