Matrix Partitioning on a Virtual Shared Memory Parallel Machine

The general problem considered in the paper is partitioning of a matrix operation between processors of a parallel system in an optimum load-balanced way without potential memory contention. The considered parallel system is defined by several features the main of which is availability of a virtual shared memory divided into segments. If partitioning of a matrix operation causes parallel access to the same memory segment with writing data to the segment by at least one processor, then contention between processors arises which implies performance degradation. To eliminate such situation, a restriction is imposed on a class of possible partitionings, so that no two processors would write data to the same segment. On the resulting class of contention-free partitionings, a load-balanced optimum partitioning is defined as satisfying independent minimax criteria. The main result of the paper is an algorithm for finding the optimum partitioning by means of analytical solution of respective minimax problems. The paper also discusses implementation and performance issues related to the algorithm, on the basis of experience at Kendall Square Research Corporation, where the partitioning algorithm was used for creating high-performance parallel matrix libraries.

[1]  Jack Dongarra,et al.  LINPACK Users' Guide , 1987 .

[2]  Sartaj Sahni,et al.  Anomalies in Parallel Branch-and-Bound Algorithms , 1984 .

[3]  James K. Archibald,et al.  Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.

[4]  Paul J. Leach,et al.  The Architecture of an Integrated Local Network , 1983, IEEE J. Sel. Areas Commun..

[5]  Alan Jay Smith,et al.  Cache Memories , 1982, CSUR.

[6]  Paul Hudak,et al.  Memory coherence in shared virtual memory systems , 1986, PODC '86.

[7]  Jagdish J. Modi Parallel algorithms and matrix computation , 1988 .

[8]  Paul Feautrier,et al.  A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[9]  Robert W. Numrich Memory contention for shared memory vector multiprocessors , 1992, Proceedings Supercomputing '92.

[10]  Butler W. Lampson,et al.  Experience with processes and monitors in Mesa , 1980, CACM.

[11]  E. Burke An overview of system software for the KSR 1 , 1993, Digest of Papers. Compcon Spring.

[12]  J. Rothnie,et al.  The KSR 1: bridging the gap between shared memory and MPPs , 1993, Digest of Papers. Compcon Spring.

[13]  Raul Mendez,et al.  Memory conflicts and machine performance , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[14]  CORNELIS H. HOOGENDOORN A General Model for Memory Interference in Multiprocessors , 1977, IEEE Transactions on Computers.

[15]  David H. Bailey,et al.  Vector Computer Memory Bank Contention , 1987, IEEE Transactions on Computers.

[16]  Margaret L. Simmons,et al.  Measurement of memory access contentions in multiple vector processor systems , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[17]  S. R. Breit,et al.  Technical applications on the KSR 1: high performance and ease of use , 1993, Digest of Papers. Compcon Spring.

[18]  Kai Li,et al.  IVY: A Shared Virtual Memory System for Parallel Computing , 1988, ICPP.

[19]  K. A. Gallivan,et al.  Parallel Algorithms for Dense Linear Algebra Computations , 1990, SIAM Rev..

[20]  Ed Anderson,et al.  LAPACK Users' Guide , 1995 .

[21]  Kai Li,et al.  Shared virtual memory on loosely coupled multiprocessors , 1986 .

[22]  Gene H. Golub,et al.  Matrix computations , 1983 .