Distance-adaptive update protocols for scalable shared-memory multiprocessors

While update protocols generally induce lower miss rates than invalidate protocols, they tend to generate much traffic. This is one of the reasons why they are considered less cost-effectively scalable than invalidate protocols and, as a result, are avoided in most existing designs of scalable shared-memory multiprocessors. However, given the increasing relative cost of cache misses, update protocols are becoming more worthy of exploration. In this paper, we present a model of sharing that is key to investigating the performance of optimized update protocols: the update distance model. The model gives insight into the update patterns that optimized protocols need to handle. Using this model, we design a new family of protocols that we call distance-adaptive protocols. In these schemes, the directory records the update patterns observed and then uses them to selectively send updates and invalidations to processors. As a result, traffic and miss rates are kept low. We present an implementation of these protocols based on a dynamic pointer scheme. A performance comparison between one of these protocols and efficient invalidate and delayed competitive-update protocols over five applications shows that the new protocol decreases the execution time by an average of 15% and 10% respectively.

[1]  Mats Brorsson,et al.  An adaptive cache coherence protocol optimized for migratory sharing , 1993, ISCA '93.

[2]  Per Stenström,et al.  Using Write Caches to Improve Performance of Cache Coherence Protocols in Shared-Memory Multiprocessors , 1995, J. Parallel Distributed Comput..

[3]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[4]  Abhinav Gupta,et al.  Analysis of cache invalidation patterns in multiprocessors , 1989, ASPLOS 1989.

[5]  Michel Dubois,et al.  Implementation and evaluation of update-based cache protocols under relaxed memory consistency models , 1995, Future Gener. Comput. Syst..

[6]  Anoop Gupta,et al.  Analysis of cache invalidation patterns in multiprocessors , 1989, ASPLOS III.

[7]  Per Stenström,et al.  An Adaptive Update-Based Cache Coherence Protocol for Reduction of Miss Rate and Traffic , 1994, PARLE.

[8]  Mark Horowitz Dynamic Pointer Allocation for Scalable Cache Coherence Directories , 1991 .

[9]  A. Gupta,et al.  The Stanford FLASH multiprocessor , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[10]  Stephen R. Goldschmidt,et al.  Simulation of multiprocessors: accuracy and performance , 1993 .

[11]  Anant Agarwal,et al.  Limits on Interconnection Network Performance , 1991, IEEE Trans. Parallel Distributed Syst..

[12]  Michael J. Flynn,et al.  Update-based cache coherence protocols for scalable shared-memory multiprocessors , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[13]  Per Stenström,et al.  Reducing the Write Traffic for a Hybrid Cache Protocol , 1994, 1994 International Conference on Parallel Processing Vol. 1.

[14]  Anna R. Karlin,et al.  Competitive snoopy caching , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[15]  Michel Dubois,et al.  Delayed consistency and its effects on the miss rate of parallel programs , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).