Unifying and Optimizing Parallel Linear Algebra Algorithms

Two issues in linear algebra algorithms for multicomputers are addressed. First, how to unify parallel implementations of the same algorithm in a decomposition-independent way. Second, how to optimize naive parallel programs maintaining the decomposition independence. Several matrix decompositions are viewed as instances of a more general allocation function called subcube matrix decomposition. By this meta-decomposition, a programming environment characterized by general primitives that allow one to design meta-algorithms independently of a particular decomposition. The authors apply such a framework to the parallel solution of dense matrices. This demonstrates that most of the existing algorithms can be derived by suitably setting the primitives used in the meta-algorithm. A further application of this programming style concerns the optimization of parallel algorithms. The idea to overlap communication and computation has been extended from 1-D decompositions to 2-D decompositions. Thus, a first attempt towards a decomposition-independent definition of such optimization strategies is provided. >

[1]  Y. Saad,et al.  Gaussian elimination on hypercubes , 1986 .

[2]  G. C. Fox,et al.  Solving Problems on Concurrent Processors , 1988 .

[3]  James M. Ortega,et al.  Parallel solution of triangular systems of equations , 1988, Parallel Comput..

[4]  Mark A. Johnson,et al.  Solving problems on concurrent processors. Vol. 1: General techniques and regular problems , 1988 .

[5]  Alan George,et al.  Gaussian elimination with partial pivoting and load balancing on a multiprocessor , 1987, Parallel Comput..

[6]  J. M. Ortega,et al.  The ijk forms of factorization methods I. Vector computers , 1988, Parallel Comput..

[7]  Michael T. Heath,et al.  Parallel solution of triangular systems on distributed-memory multiprocessors , 1988 .

[8]  Yves Robert The Impact of Vector and Parallel Architectures on the Gaussian Elimination Algorithm , 1991 .

[9]  J. M. Ortega,et al.  The ijk forms of factorization methods II. Vector computers , 1988, Parallel Comput..

[10]  Charles H. Romine,et al.  $LU$ Factorization Algorithms on Distributed-Memory Multiprocessor Architectures , 1988 .

[11]  Ken Kennedy,et al.  Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.

[12]  Rob H. Bisseling,et al.  Parallel LU Decomposition on a Transputer Network , 1988, Shell Conference.

[13]  J. Ortega Introduction to Parallel and Vector Solution of Linear Systems , 1988, Frontiers of Computer Science.

[14]  Michel Cosnard,et al.  Gaussian Elimination on Message Passing Architecture , 1987, ICS.

[15]  George J Davis,et al.  Column LU factorization with pivoting on a message-passing multiprocessor , 1986 .

[16]  Dianne P. O'Leary,et al.  Data-flow algorithms for parallel matrix computation , 1985, CACM.

[17]  Thomas F. Coleman,et al.  A parallel triangular solver for distributed-memory multiprocessor , 1988 .

[18]  R. van de Geijn,et al.  A look at scalable dense linear algebra libraries , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..

[19]  Michael T. Heath,et al.  Parallel Algorithms for Matrix Computations , 1987 .