Parallel Matrix Inversion on a Subcube-Grid

Abstract In this paper we propose a new medium-grain parallel algorithm for computing a matrix inverse on a hypercube multiprocessor. The algorithm implements Gauss-Jordan inversion with column interchanges. The hypercube network is configured as a two-dimensional subcube-grid to support submatrix partitionings. For some algorithms on some types of hypercubes, submatrix partitionings are known to have communication advantages not shared by partitions limited to rows or columns We show that such advantages can be extended to Gauss-Jordan inversion on an Intel iPSC/860, the most current third-generation of hypercubes, and that there is little extra programming effort to include it in the subcube-grid library used in various other matrix computations. An actual aggregate execution rate of 200 MFLOPS (Million Floating-point Operation Per Second) is achieved when inverting a 2000 × 2000 matrix (in double-precision Fortran 77) using 64 iPSC/860 processors configured as an 8 × 8 subcube-grid.

[1]  Alan George,et al.  QR Factorization of a Dense Matrix on a Hypercube Multiprocessor , 1990, SIAM J. Sci. Comput..

[2]  Michael T. Heath,et al.  Early Experience With the Intel iPsc/860 At Oak Ridge National Laboratory , 1991, Int. J. High Perform. Comput. Appl..

[3]  Charles H. Romine,et al.  $LU$ Factorization Algorithms on Distributed-Memory Multiprocessor Architectures , 1988 .

[4]  Jarle Berntsen,et al.  Communication efficient matrix multiplication on hypercubes , 1989, Parallel Comput..

[5]  Alan George,et al.  Gaussian elimination with partial pivoting and load balancing on a multiprocessor , 1987, Parallel Comput..

[6]  Michael T. Heath,et al.  Parallel solution of triangular systems on distributed-memory multiprocessors , 1988 .

[7]  L. M. Ni,et al.  Large-grain pipelining on hypercube multiprocessors , 1989, C3P.

[8]  Alan George,et al.  Parallel Algorithms and Subcube Embedding on a Hypercube , 1993, SIAM J. Sci. Comput..

[9]  W. Daniel Hillis,et al.  Data parallel algorithms , 1986, CACM.

[10]  Michael T. Heath,et al.  Modified cyclic algorithms for solving triangular systems on distributed-memory multiprocessors , 1988 .

[11]  T. J. Dekker,et al.  Rehabilitation of the Gauss-Jordan algorithm , 1989 .

[12]  G. C. Fox,et al.  Optimal matrix algorithms on homogeneous hypercubes , 1989, C3P.

[13]  Thomas F. Coleman,et al.  A parallel triangular solver for distributed-memory multiprocessor , 1988 .

[14]  P. G. Hipes,et al.  Gauss-Jordan inversion with pivoting on the Caltech Mark II hypercube , 1989, C3P.

[15]  A. P. Reeves,et al.  Block-matrix operations using orthogonal trees , 1989, C3P.