A dynamic cache sub-block design to reduce false sharing

Parallel applications differ from significant bus traffic due to the transfer of shared data. Large block sizes exploit locality and decrease the effective memory access time. It also has a tendency to group data together even though only a part of it is needed by any one processor. This is known as the false sharing problem. This research presents a dynamic sub-block coherence protocol which minimizes false sharing by trying to dynamically locate the point of false reference. Sharing traffic is minimized by maintaining coherence on smaller blocks (sub-blocks) which are truly shared, whereas larger blocks are used as the basic units of transfer. Larger blocks exploit locality while coherence is maintained on sub-blocks which minimize bus traffic due to shared misses. The simulation results indicate that the dynamic sub-block protocol reduces the false sharing misses by 20 to 30 percent over the fixed sub-block scheme.

[1]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[2]  Josep Torrellas,et al.  Share Data Placement Optimizations to Reduce Multiprocessor Cache Miss Rates , 1990, ICPP.

[3]  Michel Dubois,et al.  Cache protocols with partial block invalidations , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[4]  Anoop Gupta,et al.  The VMP multiprocessor: initial experience, refinements, and performance evaluation , 1988, ISCA '88.

[5]  James K. Archibald,et al.  Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.

[6]  Randy H. Katz,et al.  The effect of sharing on the cache and bus performance of parallel programs , 1989, ASPLOS 1989.

[7]  Susan J. Eggers,et al.  Eliminating False Sharing , 1991, ICPP.

[8]  Laxmi N. Bhuyan,et al.  Analysis and Comparison of Cache Coherence Protocols for a Packet-Switched Multiprocessor , 1989, IEEE Trans. Computers.

[9]  Thomas J. LeBlanc,et al.  Adjustable block size coherent caches , 1992, ISCA '92.

[10]  Anoop Gupta,et al.  Analysis of Cache Invalidation Patterns in Shared-Memory Multiprocessors , 1990 .

[11]  Eric A. Brewer,et al.  PROTEUS: a high-performance parallel-architecture simulator , 1992, SIGMETRICS '92/PERFORMANCE '92.

[12]  Randy H. Katz,et al.  The effect of sharing on the cache and bus performance of parallel programs , 1989, ASPLOS III.

[13]  Randy H. Katz,et al.  Implementing a cache consistency protocol , 1985, ISCA '85.