A Comparative Study of Distributed Shared Memory System Design Issues

In this research the various issues that arise in the design and implementation of distributed shared memory (DSM) systems are examined. This work has been motivated by two observations: distributed systems will continue to become popular, and will be increasingly used for solving large computational problems; and shared memory paradigm is attractive for programming large distributed systems because it ooers a natural transition for a programmer from the world of uniprocessors. The goal of this work is to identify a set of system issues in applying the shared memory paradigm to a distributed system, and evaluate the eeects of the ensuing design alternatives on the performance of DSM systems. The design alternatives have been evaluated in two steps. First, we undertake a detailed measurement-based study of a distributed shared memory implementation on the Clouds 1 distributed operating system towards understanding the system issues. Second, a simulation-based approach is used to evaluate the system issues. A new workload model that captures the salient features of parallel and distributed programs is developed and used to drive the simulator. The key results of the research are that the choice of the memory model and coherence protocol does not signiicantly innuence the system performance for applications exhibiting high computation granularity and low state-sharing; weaker memory models become signiicant for large-scale DSM systems; the unit of coherence maintenance depends on a set of parameters including the overheads for servicing data requests as well as the speed of data transmission on the network; and the design of miscellaneous system services (such as synchronization and data servers) can play an important role in the performance of DSM systems.