论文信息 - Implementation and evaluation of update-based cache protocols under relaxed memory consistency models

Implementation and evaluation of update-based cache protocols under relaxed memory consistency models

Abstract Invalidation-based cache coherence protocols have been extensively studied in the context of large-scale shared-memory multiprocessors. Under a relaxed memory consistency model, most of the write latency can be hidden whereas cache misses still incur a severe performance problem. By contrast, update-based protocols have a potential to reduce both write and read penalties under relaxed memory consistency models because coherence misses can be completely eliminated. The purpose of this paper is to compare update- and invalidation-based protocols for their ability to reduce or hide memory access latencies and for their ease of implementation under relaxed memory consistency models. Based on a detailed simulation study, we find that write-update protocols augmented with simple competitive mechanisms — we call such protocols competitive-update protocols — can hide all the write latency and cut the read penalty by as much as 46% at the cost of some increase in the memory traffic. However, as compared to write-invalidate, update-based protocols require more aggressive memory consistency models and more local buffering in the second-level cache to be effective. In addition, their increased number of global writes may cause increased synchronization overhead in applications with high contention for critical sections.

[1] Ken Kennedy,et al. Software prefetching , 1991, ASPLOS IV.

[2] Anoop Gupta,et al. SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[3] Anoop Gupta,et al. Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors , 1991, J. Parallel Distributed Comput..

[4] Per Stenström,et al. The Cachemire Test Bench A Flexible And Effective Approach For Simulation Of Multiprocessors , 1993, [1993] Proceedings 26th Annual Simulation Symposium.

[5] Michel Dubois,et al. Access ordering and coherence in shared memory multiprocessors , 1989 .

[6] M. Hill,et al. Weak ordering-a new definition , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[7] Robert J. Fowler,et al. A performance evaluation of optimal hybrid cache coherency protocols , 1992, ASPLOS V.

[8] James K. Archibald. A cache coherence approach for large multiprocessor systems , 1988, ICS '88.

[9] Per Stenström,et al. An Adaptive Update-Based Cache Coherence Protocol for Reduction of Miss Rate and Traffic , 1994, PARLE.

[10] Anoop Gupta,et al. Hiding memory latency using dynamic scheduling in shared-memory multiprocessors , 1992, ISCA '92.

[11] Mats Brorsson,et al. An adaptive cache coherence protocol optimized for migratory sharing , 1993, ISCA '93.

[12] Michel Dubois,et al. Scalable Shared Memory Multiprocessors , 1992, Springer US.

[13] James H. Patterson,et al. Portable Programs for Parallel Processors , 1987 .

[14] Anoop Gupta,et al. Cache Invalidation Patterns in Shared-Memory Multiprocessors , 1992, IEEE Trans. Computers.

[15] Rainer Hoch,et al. From paper to office document standard representation , 1992, Computer.

[16] P. Stenstrom. A survey of cache coherence schemes for multiprocessors , 1990, Computer.

[17] David Kroft,et al. Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.

[18] Leslie Lamport,et al. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[19] Jean-Loup Baer,et al. A performance study of memory consistency models , 1992, ISCA '92.

[20] Michel Dubois,et al. Memory Access Dependencies in Shared-Memory Multiprocessors , 1990, IEEE Trans. Software Eng..

[21] T. Mowry,et al. Comparative evaluation of latency reducing and tolerating techniques , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[22] Paul Feautrier,et al. A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[23] Anoop Gupta,et al. The Stanford Dash multiprocessor , 1992, Computer.

[24] Anoop Gupta,et al. Performance evaluation of memory consistency models for shared-memory multiprocessors , 1991, ASPLOS IV.

[25] Anant Agarwal,et al. APRIL: a processor architecture for multiprocessing , 1990, ISCA '90.

[26] Michel Dubois,et al. International Conference on Parallel Processing Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors , 2006 .

[27] Erik Hagersten,et al. DDM - A Cache-Only Memory Architecture , 1992, Computer.

[28] Per Stenström,et al. Reducing the Write Traffic for a Hybrid Cache Protocol , 1994, 1994 International Conference on Parallel Processing Vol. 1.

[29] Anna R. Karlin,et al. Competitive snoopy caching , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[30] Richard P. LaRowe,et al. Hiding Shared Memory Reference Latency on the Galactica Net Distributed Shared Memory Architecture , 1992, J. Parallel Distributed Comput..

[31] Donald Yeung,et al. THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR , 1991 .

[32] Anoop Gupta,et al. Comparative evaluation of latency reducing and tolerating techniques , 1991, ISCA '91.

[33] Lars Lundberg,et al. A Lockup-Free Multiprocessor Cache Design , 1991, ICPP.

[34] Richard P. LaRowe,et al. Hardware assist for distributed shared memory , 1993, [1993] Proceedings. The 13th International Conference on Distributed Computing Systems.

[35] R. H. Katz,et al. Evaluating the performance of four snooping cache coherency protocols , 1989, ISCA '89.

[36] Per Stenström,et al. A Survey of Cache Coherence Schemes for Multiprocessors , 1990, Computer.

[37] Mosur Ravishankar,et al. PLUS: a distributed shared-memory system , 1990, ISCA '90.