Update-based cache coherence protocols for scalable shared-memory multiprocessors

Presents two hardware-controlled update-based cache coherence protocols. The authors discuss the two major disadvantages of the update protocols: inefficiency of updates and the mismatch between the granularity of synchronization and the data transfer. They present two enhancements to the update-based protocols, a write combining scheme and a finer grain synchronization, to overcome these disadvantages. The results demonstrate the effectiveness of these enhancements that, when used together, allow the update-based protocols to significantly improve the execution time of a set of scientific applications when compared to three invalidate-based protocols.<<ETX>>

[1]  Richard P. LaRowe,et al.  Hiding Shared Memory Reference Latency on the Galactica Net Distributed Shared Memory Architecture , 1992, J. Parallel Distributed Comput..

[2]  Anoop Gupta,et al.  A comparative evaluation of nodal and supernodal parallel sparse matrix factorization: detailed simulation results , 1990 .

[3]  Umakishore Ramachandran,et al.  Coherence of Distributed Shared Memory: Unifying Synchronization and Data Transfer , 1989, International Conference on Parallel Processing.

[4]  Anant Agarwal,et al.  LimitLESS directories: A scalable cache coherence scheme , 1991, ASPLOS IV.

[5]  R. H. Katz,et al.  Evaluating the performance of four snooping cache coherency protocols , 1989, ISCA '89.

[6]  Mark Horowitz,et al.  Modeling the Performance of Limited Pointers Directories for Cache Coherence , 1991, ISCA.

[7]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[8]  Richard P. LaRowe,et al.  Hardware assist for distributed shared memory , 1993, [1993] Proceedings. The 13th International Conference on Distributed Computing Systems.

[9]  Anoop Gupta,et al.  Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors , 1991, J. Parallel Distributed Comput..

[10]  Bruce Delagi,et al.  Scalable Cache Coherence Analysis for Shared Memory Multiprocessors , 1992 .

[11]  M. Hill,et al.  Weak ordering-a new definition , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[12]  James R. Larus,et al.  Mechanisms for Cooperative Shared Memory , 1994 .

[13]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[14]  Randy H. Katz,et al.  Evaluating The Performance Of Four Snooping Cache Coherency Protocols , 1989, The 16th Annual International Symposium on Computer Architecture.

[15]  Paul Hudak,et al.  Memory coherence in shared virtual memory systems , 1986, PODC '86.

[16]  Anoop Gupta,et al.  Analysis of cache invalidation patterns in multiprocessors , 1989, ASPLOS III.

[17]  Bill Nitzberg,et al.  Distributed shared memory: a survey of issues and algorithms , 1991, Computer.

[18]  Michael J. Flynn,et al.  Linked list cache coherence for scalable shared memory multiprocessors , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[19]  Manu Thapar Cache coherence for scalable shared memory multiprocessors , 1992 .

[20]  P. Stenstrom A survey of cache coherence schemes for multiprocessors , 1990, Computer.

[21]  Willy Zwaenepoel,et al.  Munin: distributed shared memory based on type-specific memory coherence , 1990, PPOPP '90.

[22]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.

[23]  Burton J. Smith Architecture And Applications Of The HEP Multiprocessor Computer System , 1982, Optics & Photonics.

[24]  Iain S. Duff,et al.  Parallel implementation of multifrontal schemes , 1986, Parallel Comput..

[25]  Roberto Bisiani,et al.  PLUS: a distributed shared-memory system , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.