Issues in the design of distributed shared memory systems

This thesis examines the various system issues that arise in the design of distributed shared memory (DSM) systems. This work has been motivated by the observation that distributed systems will continue to become popular, and will be increasingly used for solving large computational problems. To this effect, shared memory paradigm is attractive for programming large distributed systems because it offers a natural transition for a programmer from the world of uniprocessors. The goal of this work is to identify a set of system issues, such as integration of DSM with virtual memory management, choice of memory model, choice of coherence protocol, and technology factors; and evaluate the effects of the design alternatives on the performance of DSM systems. The specific question that we are trying to answer is, "Can we determine a set of system design parameters that defines an efficient realization of a distributed shared memory system?" The design alternatives have been evaluated in three steps. First, we do a detailed performance study of a distributed shared memory implementation on the C scLOUDS$\sp1$ distributed operating system. Second, we implement and analyze the performance of several applications on a distributed shared memory system. Third, the system issues that could not be evaluated via the experimental study, are evaluated using a simulation-based approach. The simulation model is developed from our experience with the C scLOUDS distributed system. A new workload model that captures the salient features of parallel and distributed programs is developed and used to drive the simulator. The key results of the thesis are: DSM mechanisms have to be integrated with the virtual memory management for providing high performance distributed shared memory systems; the choice of the memory model and coherence protocol does not significantly influence the system performance for applications exhibiting high computation granularity and low state-sharing; and an efficient implementation of DSM requires a careful design of miscellaneous system services (such as synchronization and data servers). The thesis also enumerates several questions related to future research directions. ftn$\sp1$C scLOUDS is a distributed object-based operating system developed at Georgia Tech.

[1]  Roger M. Needham,et al.  On the duality of operating system structures , 1979, OPSR.

[2]  Michael Stonebraker,et al.  A measure of transaction processing power , 1985 .

[3]  Roy H. Campbell,et al.  Distributed virtual memory consistency protocols: design and performance , 1990, IEEE Workshop on Experimental Distributed Systems.

[4]  Michael Stumm,et al.  Algorithms implementing distributed shared memory , 1990, Computer.

[5]  Peter B. Danzig,et al.  High resolution timing with low resolution clocks and microsecond resolution timer for Sun workstations , 1990, OPSR.

[6]  Henri E. Bal,et al.  A comparison of two paradigms for distributed shared memory , 1992, Softw. Pract. Exp..

[7]  John L. Hennessy,et al.  Multiprocessor Simulation and Tracing Using Tango , 1991, ICPP.

[8]  Gaetano Borriello,et al.  Practical dictionary management for hardware data compression , 1992, CACM.

[9]  Partha Dasgupta,et al.  The Clouds distributed operating system: functional description, implementation details and related work , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[10]  Alessandro Forin,et al.  Multilanguage Parallel Programming of Heterogeneous Machines , 1988, IEEE Trans. Computers.

[11]  B. D. Fleisch Reliable distributed shared memory , 1990, IEEE Workshop on Experimental Distributed Systems.

[12]  Joonwon Lee,et al.  Architectural primitives for a scalable shared memory multiprocessor , 1991, SPAA '91.

[13]  Richard E. Kessler,et al.  An analysis of distributed shared memory algorithms , 1989, [1989] Proceedings. The 9th International Conference on Distributed Computing Systems.

[14]  Philip J. Woest,et al.  The Wisconsin multicube: a new large-scale cache-coherent multiprocessor , 1988, ISCA '88.

[15]  James K. Archibald,et al.  Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.

[16]  Paul Hudak,et al.  Memory coherence in shared virtual memory systems , 1986, PODC '86.

[17]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.

[18]  Brett D. Fleisch,et al.  Mirage: a coherent distributed shared memory design , 1989, SOSP '89.

[19]  Umakishore Ramachandran,et al.  Coherence of Distributed Shared Memory: Unifying Synchronization and Data Transfer , 1989, International Conference on Parallel Processing.

[20]  David D. Redell,et al.  Evolution of the Ethernet Local Computer Network , 1982, Computer.

[21]  David L. Black,et al.  Machine-independent virtual memory management for paged uniprocessor and multiprocessor architectures , 1987, IEEE Trans. Computers.

[22]  Kun-Lung Wu,et al.  Recoverable Distributed Shared Virtual Memory , 1990, IEEE Trans. Computers.

[23]  Joonwon Lee,et al.  Synchronization with multiprocessor caches , 1990, ISCA '90.

[24]  Paul J. Leach,et al.  The Architecture of an Integrated Local Network , 1983, IEEE J. Sel. Areas Commun..

[25]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[26]  Herb Schwetman,et al.  CSIM: a C-based process-oriented simulation language , 1986, WSC '86.

[27]  Adarshpal S. Sethi,et al.  An analysis of Memnet - an experiment in high-speed shared-memory local networking , 1988, SIGCOMM.

[28]  Ravi Kumar,et al.  Scalability Study of the KSR-1 , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[29]  Willy Zwaenepoel,et al.  Implementation and performance of Munin , 1991, SOSP '91.

[30]  Gérard M. Baudet,et al.  Asynchronous Iterative Methods for Multiprocessors , 1978, JACM.

[31]  Ronald Minnich,et al.  Reducing host load, network load, and latency in a distributed shared memory , 1990, Proceedings.,10th International Conference on Distributed Computing Systems.