Memory consistency and event ordering in scalable shared-memory multiprocessors

Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge the gap between the slow shared memory and the fast processors. Unless carefully controlled, such architectural optimizations can cause memory accesses to be executed in an order different from what the programmer expects. The set of allowable memory access orderings forms the memory consistency model or event ordering model for an architecture. This paper introduces a new model of memory consistency, called release consistency, that allows for more buffering and pipelining than previously proposed models. A framework for classifying shared accesses and reasoning about event ordering is developed. The release consistency model is shown to be equivalent to the sequential consistency model for parallel programs with sufficient synchronization. Possible performance gains from the less strict constraints of the release consistency model are explored. Finally, practical implementation issues are discussed, concentrating on issues relevant to scalable architectures.

[1]  Gebräuchliche Fertigarzneimittel,et al.  V , 1893, Therapielexikon Neurologie.

[2]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[3]  Kevin P. McAuliffe,et al.  RP3 Processor-Memory Element , 1985, ICPP.

[4]  D. Patterson The University of Southern California. , 1985 .

[5]  Kevin P. McAuliffe,et al.  The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture , 1985, ICPP.

[6]  Proceedings of the 16th annual international symposium on Computer architecture , 1986 .

[7]  Michel Dubois,et al.  Correct memory operation of cache-based multiprocessors , 1987, ISCA '87.

[8]  Dennis Shasha,et al.  Efficient and correct execution of parallel programs that share memory , 1988, TOPL.

[9]  F. Baskett,et al.  The 4D-MP graphics superworkstation: computing+graphics=40 MIPS+MFLOPS and 100000 lighted polygons per second , 1988, Digest of Papers. COMPCON Spring 88 Thirty-Third IEEE Computer Society International Conference.

[10]  Michel Dubois,et al.  Access ordering and coherence in shared memory multiprocessors , 1989 .

[11]  V. Rich Personal communication , 1989, Nature.

[12]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.

[13]  Anoop Gupta,et al.  Performance evaluation of memory consistency models for shared-memory multiprocessors , 1991, ASPLOS IV.

[14]  James R. Goodman,et al.  Cache Consistency and Sequential Consistency , 1991 .

[15]  Anoop Gupta,et al.  Two Techniques to Enhance the Performance of Memory Consistency Models , 1991, ICPP.

[16]  Anoop Gupta,et al.  Hiding memory latency using dynamic scheduling in shared-memory multiprocessors , 1992, ISCA '92.

[17]  Alan L. Cox,et al.  Evaluation of release consistent software distributed shared memory on emerging network technology , 1993, ISCA '93.

[18]  Sarita V. Adve,et al.  Designing memory consistency models for shared-memory multiprocessors , 1993 .

[19]  Kourosh Gharachorloo,et al.  Memory consistency models for shared-memory multiprocessors , 1995 .

[20]  Katherine A. Yelick,et al.  Optimizing parallel programs with explicit synchronization , 1995, PLDI '95.

[21]  Sarita V. Adve,et al.  Shared Memory Consistency Models: A Tutorial , 1996, Computer.

[22]  Sarita V. Adve,et al.  An evaluation of memory consistency models for shared-memory systems with ILP processors , 1996, ASPLOS VII.

[23]  Kourosh Gharachorloo,et al.  Towards transparent and efficient software distributed shared memory , 1997, SOSP.

[24]  M. Dubois,et al.  Memory access buffering in multiprocessors , 1986, ISCA '98.

[25]  W. Paul,et al.  Computer Architecture , 2000, Springer Berlin Heidelberg.