Analysis of Cache Invalidation Patterns in Shared-Memory Multiprocessors

To make shared-memory multiprocessors scalable, researchers are now exploring cache coherence protocols that do not rely on broadcast, but instead send invalidation messages to individual caches that contain stale data. The feasibility of such directory-based protocols is highly sensitive to the cache invalidation patterns that parallel programs exhibit. In this paper, we analyze the cache invalidation patterns caused by several parallel applications and investigate the effect of these patterns on a directory-based protocol. Our results are based on multiprocessor traces with 4, 8 and 16 processors. To gain insight into what the invalidation patterns would look like beyond 16 processors, we propose a classification scheme for data objects found in parallel applications and link the invalidation traffic patterns observed in the traces back to these high-level objects. Our results show that synchronization objects have very different invalidation patterns from those of other data objects. We point out situations where restructuring the application seems appropriate to reduce the invalidation traffic, and others where hardware support is more appropriate. Our results also show that it should be possible to scale “well-written” parallel programs to a large number of processors without an explosion in invalidation traffic.

[1]  K. Mani Chandy,et al.  Asynchronous distributed simulation via a sequence of parallel computations , 1981, CACM.

[2]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[3]  Anoop Gupta,et al.  Memory-reference characteristics of multiprocessor applications under MACH , 1988, SIGMETRICS 1988.

[4]  Jonathan Rose LocusRoute: a parallel global router for standard cells , 1988, 25th ACM/IEEE, Design Automation Conference.Proceedings 1988..

[5]  Anant Agarwal,et al.  Multiprocessor cache analysis using ATUM , 1988, ISCA '88.

[6]  J. Mcdonald,et al.  Vectorization of a particle simulation method for hypersonic rarefied flow , 1988 .

[7]  Randy H. Katz,et al.  Implementing a cache consistency protocol , 1985, ISCA 1985.

[8]  Pradeep S. Sindhu,et al.  The Architecture of the Dragon , 1985, COMPCON.

[9]  James H. Patterson,et al.  Portable Programs for Parallel Processors , 1987 .

[10]  Abhinav Gupta,et al.  Analysis of cache invalidation patterns in multiprocessors , 1989, ASPLOS 1989.

[11]  Paul Feautrier,et al.  A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[12]  Lawrence C. Stewart,et al.  Firefly: a multiprocessor workstation , 1987, ASPLOS 1987.

[13]  Shreekant S. Thakkar,et al.  The Symmetry Multiprocessor System , 1988, ICPP.

[14]  Tom Blank,et al.  Parallel logic simulation on general purpose machines , 1988, 25th ACM/IEEE, Design Automation Conference.Proceedings 1988..