An Asynchronous Protocol for Release Consistent Distributed Shared Memory Systems

Distributed shared memory (DSM) systems provide a simple programming paradigm for networks of workstations, which are gaining popularity due to their cost-effective high computing power. However, DSM systems usually exhibit poor performance due to the large communication delay between the nodes; and a lot of different memory consistency models have been proposed to mask the network delay. In this paper, we propose an asynchronous protocol for the release consistent memory model, which we call an Asynchronous Release Consistency (ARC) protocol. Unlike other protocols where the communication adheres to the synchronous request/receive paradigm, the ARC protocol is asynchronous, such that the necessary pages are broadcast before they are requested. Hence, the network delay can be reduced by proper prefetching of necessary pages. We have also compared the performance of the ARC protocol with the lazy release protocol by running standard benchmark programs; and the experimental results showed that the ARC protocol achieves a performance improvement of up to 29%.

[1]  Per Stenström,et al.  A Survey of Cache Coherence Schemes for Multiprocessors , 1990, Computer.

[2]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[3]  P. Keleher,et al.  Lazy release consistency for distributed shared memory , 1996 .

[4]  Michel Dubois,et al.  Memory access buffering in multiprocessors , 1998, ISCA '98.

[5]  Rida A. Bazzi,et al.  The power of processor consistency , 1993, SPAA '93.

[6]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[7]  Peter J. Keleher,et al.  Multi-threading and remote latency in software DSMs , 1997, Proceedings of 17th International Conference on Distributed Computing Systems.

[8]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[9]  Ricardo Bianchini,et al.  Limits on the performance benefits of multithreading and prefetching , 1996, SIGMETRICS '96.

[10]  P. Stenstrom A survey of cache coherence schemes for multiprocessors , 1990, Computer.

[11]  Mark D. Hill,et al.  Weak ordering—a new definition , 1998, ISCA '98.

[12]  Peter J. Keleher,et al.  Per-Node Multithreading and Remote Latency , 1998, IEEE Trans. Computers.

[13]  Bill Nitzberg,et al.  Distributed shared memory: a survey of issues and algorithms , 1991, Computer.

[14]  Sarita V. Adve,et al.  Shared Memory Consistency Models: A Tutorial , 1996, Computer.

[15]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, ISCA '90.

[16]  Alan L. Cox,et al.  An Evaluation of Software-Based Release Consistent Protocols , 1995, J. Parallel Distributed Comput..

[17]  Liviu Iftode,et al.  Scope Consistency: A Bridge between Release Consistency and Entry Consistency , 1996, SPAA '96.

[18]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.

[19]  Kai Li,et al.  Shared virtual memory on loosely coupled multiprocessors , 1986 .

[20]  Mats Brorsson,et al.  Predicting the Performance of Distributed Virtual Shared-Memory Applications , 1997, IBM Syst. J..

[21]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[22]  Jack E. Veenstra,et al.  Mint Tutorial and User Manual , 1993 .

[23]  Thorsten von Eicken,et al.  技術解説 IEEE Computer , 1999 .

[24]  Paul Hudak,et al.  Memory coherence in shared virtual memory systems , 1986, PODC '86.

[25]  Philip J. Woest,et al.  The Wisconsin multicube: a new large-scale cache-coherent multiprocessor , 1988, ISCA '88.

[26]  Rainer Hoch,et al.  From paper to office document standard representation , 1992, Computer.

[27]  Philip Bitar,et al.  The Weakest Memory-Access Order , 1992, J. Parallel Distributed Comput..

[28]  Anoop Gupta,et al.  The Stanford Dash multiprocessor , 1992, Computer.

[29]  Alan L. Cox,et al.  Quantifying the Performance Differences between PVM and TreadMarks , 1997, J. Parallel Distributed Comput..

[30]  Willy Zwaenepoel,et al.  Implementation and performance of Munin , 1991, SOSP '91.

[31]  Alan L. Cox,et al.  TreadMarks: shared memory computing on networks of workstations , 1996 .

[32]  Alan L. Cox,et al.  A comparison of entry consistency and lazy release consistency implementations , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[33]  Brian N. Bershad,et al.  The Midway distributed shared memory system , 1993, Digest of Papers. Compcon Spring.

[34]  Mustaque Ahamad,et al.  Slow memory: weakening consistency to enhance concurrency in distributed shared memories , 1990, Proceedings.,10th International Conference on Distributed Computing Systems.