Making distributed shared memory simple, yet efficient

Recent research on distributed shared memory (DSM) has focussed on improving performance by reducing the communication overhead of DSM. Features added include lazy release consistency based coherence protocols and new interfaces that give programmers the ability to hand tune communication. These features have increased DSM performance at the expense of requiring increasingly complex DSM systems or increasingly cumbersome programming. They have also increased the computation overhead of DSM, which has partially offset the communication related performance gains. We chose to implement a simple DSM system, Quarks, with an eye towards hiding most computation overhead while using a very low latency transport layer to reduce the effect of communication overhead. The resulting performance is comparable to that of far more complex DSM systems, such as Treadmarks and Cashmere.

[1]  Leigh Stoller,et al.  Paint: pa instruction set interpreter , 1996 .

[2]  Paul Hudak,et al.  Memory coherence in shared virtual memory systems , 1986, PODC '86.

[3]  Leigh Stoller,et al.  Direct deposit: A basic user-level protocol for carpet clusters , 1995 .

[4]  Srinivasan Parthasarathy,et al.  Cashmere-2L: software coherent shared memory on a clustered remote-write network , 1997, SOSP.

[5]  Willy Zwaenepoel,et al.  Techniques for reducing consistency-related communication in distributed shared-memory systems , 1995, TOCS.

[6]  Michel Dubois,et al.  Delayed consistency and its effects on the miss rate of parallel programs , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[7]  Alan L. Cox,et al.  Software versus hardware shared-memory implementation: a case study , 1994, ISCA '94.

[8]  Kourosh Gharachorloo,et al.  Towards transparent and efficient software distributed shared memory , 1997, SOSP.

[9]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[10]  Jeffrey S. Chase,et al.  The Amber system: parallel programming on a network of multiprocessors , 1989, SOSP '89.

[11]  Henri E. Bal,et al.  Orca: A Language For Parallel Programming of Distributed Systems , 1992, IEEE Trans. Software Eng..

[12]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[13]  Scott Pakin,et al.  High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[14]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local , 1995 .

[15]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[16]  Brett D. Fleisch,et al.  Mirage: a coherent distributed shared memory design , 1989, SOSP '89.

[17]  Partha Dasgupta,et al.  The Design and Implementation of the Clouds Distributed Operating System , 1989, Comput. Syst..

[18]  James R. Larus,et al.  Where is time spent in message-passing and shared-memory programs? , 1994, ASPLOS VI.

[19]  Kirk L. Johnson,et al.  CRL: high-performance all-software distributed shared memory , 1995, SOSP.

[20]  D. Lenoski,et al.  The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[21]  John Wilkes Hamlyn — an interface for sender- based communications , 1992 .

[22]  Brian N. Bershad,et al.  The Midway distributed shared memory system , 1993, Digest of Papers. Compcon Spring.