Boosting the performance of hybrid snooping cache protocols

Previous studies of bus-based shared-memory multiprocessors have shown hybrid write-invalidate/write-update snooping protocols to be incapable of providing consistent performance improvements over write-invalidate protocols. We analyze the deficiencies of hybrid snooping protocols under release consistency, and show how these deficiencies can be dramatically reduced by using write caches and read snarfing. Our performance evaluation is based on program-driven simulation and a set of five scientific applications with different sharing behaviors including migratory sharing as well as producer-consumer sharing. We show that a hybrid protocol, extended with write caches as well as read snarfing, manages to reduce the number of coherence misses by between 83% and 95% as compared to a write-invalidate protocol for all five applications in this study. In addition the number of bus transactions is reduced by between 36% and 60% for four of the applications and by 9% for the fifth application. Because of the small implementation cost of the hybrid protocol and the two extensions, we believe that this combination is an effective approach to boost the performance of bus-based multiprocessors.

[1]  Jean-Loup Baer,et al.  Two techniques for improving performance on bus-based multiprocessors , 1995, Future Gener. Comput. Syst..

[2]  Larry Rudolph,et al.  Dynamic decentralized cache schemes for mimd parallel processors , 1984, ISCA '84.

[3]  Larry Rudolph,et al.  Dynamic decentralized cache schemes for mimd parallel processors , 1984, ISCA 1984.

[4]  Anna R. Karlin,et al.  Competitive snoopy caching , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[5]  Lawrence C. Stewart,et al.  The Alpha demonstration unit: a high-performance multiprocessor , 1993, CACM.

[6]  Eric Williams,et al.  Performance optimizations, implementation, and verification of the SGI Challenge multiprocessor , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[7]  Per Stenström,et al.  Using Write Caches to Improve Performance of Cache Coherence Protocols in Shared-Memory Multiprocessors , 1995, J. Parallel Distributed Comput..

[8]  Robert J. Fowler,et al.  The Prospects for On-Line Hybrid Coherency Protocols on Bus-Based Multiprocessors , 1994 .

[9]  R. H. Katz,et al.  Evaluating the performance of four snooping cache coherency protocols , 1989, ISCA '89.

[10]  Randy H. Katz,et al.  Evaluating The Performance Of Four Snooping Cache Coherency Protocols , 1989, The 16th Annual International Symposium on Computer Architecture.

[11]  Philip J. Woest,et al.  The Wisconsin multicube: a new large-scale cache-coherent multiprocessor , 1988, ISCA '88.

[12]  Anoop Gupta,et al.  Performance evaluation of memory consistency models for shared-memory multiprocessors , 1991, ASPLOS IV.

[13]  Norman P. Jouppi Cache write policies and performance , 1993, ISCA '93.

[14]  Per Stenström,et al.  The Cachemire Test Bench A Flexible And Effective Approach For Simulation Of Multiprocessors , 1993, [1993] Proceedings 26th Annual Simulation Symposium.

[15]  James K. Archibald A cache coherence approach for large multiprocessor systems , 1988, ICS '88.

[16]  Lawrence C. Stewart,et al.  Firefly: a multiprocessor workstation , 1987, IEEE Trans. Computers.

[17]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.