An Implementation of Mermera: A Shared Memory System that Mixes Coherence with Non-coherence

Abstract Coherent shared memory is a convenient, but inefficient, method of inter-process communication for parallel programs. By contrast, message passing can be less convenient, but more efficient. To get the benefits of both models, several non-coherent memory behaviors have recently been proposed in the literature. We present an implementation of Mermera, a shared memory system that supports both coherent and non-coherent behaviors in a manner that enables programmers to mix multiple behaviors in the same program~\cite{HeddayaS93}. A programmer can debug a Mermera program using coherent memory, and then improve its performance by selectively reducing the level of coherence in the parts that are critical to performance. Mermera permits a trade-off of coherence for performance. We analyze this trade-off through measurements of our implementation, and by an example that illustrates the style of programming needed to exploit non-coherence. We find that, even on a small network of workstations, the performance advantage of non-coherence is compelling. Raw non-coherent memory operations perform 20-40~times better than non-coherent memory operations. An example aplication program is shown to run 5-11~times faster when permitted to exploit non-coherence. We conclude by commenting on our use of the Isis Toolkit of multicast protocols in implementing Mermera.

[1]  Mustaque Ahamad,et al.  Slow memory: weakening consistency to enhance concurrency in distributed shared memories , 1990, Proceedings.,10th International Conference on Distributed Computing Systems.

[2]  Gérard M. Baudet,et al.  Asynchronous Iterative Methods for Multiprocessors , 1978, JACM.

[3]  Willy Zwaenepoel,et al.  Implementation and performance of Munin , 1991, SOSP '91.

[4]  Robert A. Whiteside,et al.  Implementing Linda for distributed and parallel processing , 1989, ICS '89.

[5]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[6]  Abdelsalam Heddaya,et al.  Coherence, Non-coherence and Local Consistency in Distributed Shared Memory for Parallel Computing , 1992 .

[7]  André Schiper,et al.  Lightweight causal and atomic group multicast , 1991, TOCS.

[8]  Kenneth P. Birman,et al.  Exploiting virtual synchrony in distributed systems , 1987, SOSP '87.

[9]  Robbert van Renesse,et al.  Reliable Multicast between Micro-Kernels , 1992, USENIX Workshop on Microkernels and Other Kernel Architectures.

[10]  Anant Agarwal,et al.  LimitLESS directories: A scalable cache coherence scheme , 1991, ASPLOS IV.

[11]  Anoop Gupta,et al.  The DASH Prototype: Logic Overhead and Performance , 1993, IEEE Trans. Parallel Distributed Syst..

[12]  Stephen E. Deering,et al.  Multicast routing in datagram internetworks and extended LANs , 1990, TOCS.

[13]  Michel Dubois,et al.  Concurrent Miss Resolution in Multiprocessor Caches , 1988, ICPP.

[14]  Himanshu Shekhar Sinha Mermera: non-coherent distributed shared memory for parallel computing , 1993 .

[15]  Mustaque Ahamad,et al.  Implementing and programming causal distributed shared memory , 1991, [1991] Proceedings. 11th International Conference on Distributed Computing Systems.

[16]  Jack Dongarra,et al.  A User''s Guide to PVM Parallel Virtual Machine , 1991 .

[17]  Mosur Ravishankar,et al.  Programming the PLUS Distributed-Memory System , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[18]  Paul Hudak,et al.  Memory coherence in shared virtual memory systems , 1986, PODC '86.

[19]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[20]  William E. Weihl,et al.  Multi-version memory: software cache management for concurrent B-trees , 1990, Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing 1990.

[21]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.