论文信息 - Parallel, multigrain iterative solvers for hiding network latencies on MPPs and networks of clusters

Parallel, multigrain iterative solvers for hiding network latencies on MPPs and networks of clusters

Parallel iterative solvers are often the only means of solving large linear systems and eigenproblems. However, these solvers are usually implemented in a fine-grain manner and can incur significant performance penalties due to synchronization overheads on large MPPs. This problem is exacerbated in clusters of workstations (COWs) and SMPs that are interconnected via a hierarchy of networks. In this paper, we describe a novel scheme for hiding the synchronization overheads, and thus improving scalability, of block iterative solvers that employ a correction equation through an inner iterative method.Block methods are not only robust in the presence of eigenvalue multiplicities and multiple right-hand sides, but provide better latency tolerance by performing more floating-point operations between synchronizations. We take a different approach to inducing latency tolerance by increasing the granularity at which the correction equation is solved for each block vector. This is accomplished by splitting the processors into smaller subgroups which are then used to solve the correction for each block vector concurrently. The rest of the algorithm is still performed in fine grain. We call this combination of fine and coarse-grain parallelism multigrain parallelism.We implemented a multigrain, block Jacobi-Davidson algorithm for computing the extreme eigenvalues of a symmetric matrix. We obtained improvements of 45-50% over both the block and non-block implementations of the fine-grain method when testing on an IBM SP and on a collection of clusters of Sun workstations.

Andreas Stathopoulos | James R. McCombs | A. Stathopoulos

[1] Anthony T. Chronopoulos. s-Step Iterative Methods for (Non) Symmetric (In) Definite Linear Systems , 1989, PPSC.

[2] Cevdet Aykanat,et al. Vectorization and parallelization of the conjugate gradient algorithm on hypercube-connected vector processors , 1990 .

[3] Gerard L. G. Sleijpen,et al. Alternative correction equations in the Jacobi-Davidson method , 1999, Numer. Linear Algebra Appl..

[4] Gerard L. G. Sleijpen,et al. Jacobi-Davidson Style QR and QZ Algorithms for the Reduction of Matrix Pencils , 1998, SIAM J. Sci. Comput..

[5] H. V. D. Vorst,et al. Reducing the effect of global communication in GMRES( m ) and CG on parallel distributed memory computers , 1995 .

[6] A. Stathopoulos,et al. Reducing synchronization on the parallel Davidson method for the large sparse, eigenvalue problem , 1993, Supercomputing '93.

[7] H. V. D. Vorst,et al. Jacobi-davidson type methods for generalized eigenproblems and polynomial eigenproblems , 1995 .

[8] Richard Tran Mills,et al. Dynamic load balancing of an iterative eigensolver on networks of heterogeneous clusters , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[9] Gerard L. G. Sleijpen,et al. A generalized Jacobi-Davidson iteration method for linear eigenvalue problems , 1998 .

[10] Andreas Stathopoulos,et al. A Parallel, Block, Jacobi-Davidson Implementation for Solving Large Eigenproblems on Coarse Grain Environment , 1999, PDPTA.

[11] Ami Marowka,et al. The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..