The design and evaluation of a shared object system for distributed memory machines

This paper describes the design and evaluation of SAM, a shared object system for distributed memory machines. SAM is a portable run-time system that provides a global name space and automatic caching of shared data. SAM incorporates mechanisms to address the problem of high communication overheads on distributed memory machines; these mechanisms include tying synchronization to data access, chaotic access to data, prefetching of data, and pushing of data to remote processors. SAM has been implemented on the CM-5, Intel iPSC/860 and Paragon, IBM SP1, and networks of workstations running PVM. SAM applications run on all these platforms without modification. This paper provides an extensive analysis on several complex scientific algorithms written in SAM on a variety of hardware platforms. We find that the performance of these SAM applications depends fundamentally on the scalability of the underlying parallel algorithm, and whether the algorithm's communication requirements can be satisfied by the hardware. Our experience suggests that SAM is successful in allowing programmers to use distributed memory machines effectively with much less programming effort than required today.

[1]  Jeffrey S. Chase,et al.  The Amber system: parallel programming on a network of multiprocessors , 1989, SOSP '89.

[2]  B. Buchberger An Algorithmic Method in Polynomial Ideal Theory , 1985 .

[3]  Keshav Pingali,et al.  Accumulators: New Logic Variable Abstractions for Functional Languages , 1991, Theor. Comput. Sci..

[4]  Jr. Nicholas John Carriero Implementation of tuple space machines , 1987 .

[5]  Jaswinder Pal Singh,et al.  Hierarchical n-body methods and their implications for multiprocessors , 1993 .

[6]  Henri E. Bal,et al.  Orca: A Language For Parallel Programming of Distributed Systems , 1992, IEEE Trans. Software Eng..

[7]  Henry M. Levy,et al.  Distributed shared memory with versioned objects , 1992, OOPSLA '92.

[8]  A. Gupta,et al.  An efficient block-oriented approach to parallel sparse Cholesky factorization , 1993, Supercomputing '93.

[9]  Katherine A. Yelick,et al.  Implementing an irregular application on a distributed memory multiprocessor , 1993, PPOPP '93.

[10]  Ronald Minnich,et al.  Reducing host load, network load, and latency in a distributed shared memory , 1990, Proceedings.,10th International Conference on Distributed Computing Systems.

[11]  Umakishore Ramachandran,et al.  An implementation of distributed shared memory , 1991, Softw. Pract. Exp..

[12]  Eric A. Brewer,et al.  PRELUDE: A System for Portable Parallel Software , 1992, PARLE.

[13]  M. Warren,et al.  An O(NlogN) hypercube N-body integrator , 1989, C3P.

[14]  Brian N. Bershad,et al.  The Midway distributed shared memory system , 1993, Digest of Papers. Compcon Spring.

[15]  Alan L. Cox,et al.  Software versus hardware shared-memory implementation: a case study , 1994, ISCA '94.

[16]  David C. Cann,et al.  A Report on the Sisal Language Project , 1990, J. Parallel Distributed Comput..

[17]  Anoop Gupta,et al.  The Stanford Dash multiprocessor , 1992, Computer.

[18]  K. Mani Chandy The composition of concurrent programs , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[19]  Vaidy S. Sunderam,et al.  PVM: A Framework for Parallel Distributed Computing , 1990, Concurr. Pract. Exp..

[20]  Robert D. Bjornson Linda on distributed memory multiprocessors , 1993 .

[21]  Alan L. Cox,et al.  TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , 1994, USENIX Winter.

[22]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[23]  Monica S. Lam,et al.  An Efficient Shared Memory Layer for Distributed Memory Machines. , 1994 .

[24]  Alessandro Forin,et al.  Multilanguage Parallel Programming of Heterogeneous Machines , 1988, IEEE Trans. Computers.

[25]  Rishiyur S. Nikhil The Parallel Programming Language Id and its Compilation for Parallel Machines , 1993, Int. J. High Speed Comput..

[26]  K. Mani Chandy,et al.  Compositional C++: Compositional Parallel Programming , 1992, LCPC.

[27]  Willy Zwaenepoel,et al.  Implementation and performance of Munin , 1991, SOSP '91.

[28]  Kai Li,et al.  IVY: A Shared Virtual Memory System for Parallel Computing , 1988, ICPP.