Analysis of cache invalidation patterns in multiprocessors

To make shared-memory multiprocessors scalable, researchers are now exploring cache coherence protocols that do not rely on broadcast, but instead send invalidation messages to individual caches that contain stale data. The feasibility of such directory-based protocols is highly sensitive to the cache invalidation patterns that parallel programs exhibit. In this paper, we analyze the cache invalidation patterns caused by several parallel applications and investigate the effect of these patterns on a directory-based protocol. Our results are based on multiprocessor traces with 4, 8 and 16 processors. To gain insight into what the invalidation patterns would look like beyond 16 processors, we propose a classification scheme for data objects found in parallel applications and link the invalidation traffic patterns observed in the traces back to these high-level objects. Our results show that synchronization objects have very different invalidation patterns from those of other data objects. A write reference to a synchronization object usually causes invalidations in many more caches. We point out situations where restructuring the application seems appropriate to reduce the invalidation traffic, and others where hardware support is more appropriate. Our results also show that it should be possible to scale “well-written” parallel programs to a large number of processors without an explosion in invalidation traffic.

[1]  Paul Feautrier,et al.  A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[2]  K. Mani Chandy,et al.  Asynchronous distributed simulation via a sequence of parallel computations , 1981, CACM.

[3]  J. Goodman Using cache memory to reduce processor-memory traffic , 1983, ISCA '83.

[4]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[5]  A. Sangiovanni-Vincentelli,et al.  The TimberWolf placement and routing package , 1985, IEEE Journal of Solid-State Circuits.

[6]  Pradeep S. Sindhu,et al.  The Architecture of the Dragon , 1985, COMPCON.

[7]  Randy H. Katz,et al.  Implementing a cache consistency protocol , 1985, ISCA '85.

[8]  James H. Patterson,et al.  Portable Programs for Parallel Processors , 1987 .

[9]  Lawrence C. Stewart,et al.  Firefly: a multiprocessor workstation , 1987, IEEE Trans. Computers.

[10]  Tom Blank,et al.  Parallel logic simulation on general purpose machines , 1988, 25th ACM/IEEE, Design Automation Conference.Proceedings 1988..

[11]  Shreekant S. Thakkar,et al.  The Symmetry Multiprocessor System , 1988, ICPP.

[12]  Anant Agarwal,et al.  Multiprocessor cache analysis using ATUM , 1988, ISCA '88.

[13]  Jonathan Rose LocusRoute: a parallel global router for standard cells , 1988, 25th ACM/IEEE, Design Automation Conference.Proceedings 1988..

[14]  J. Mcdonald,et al.  Vectorization of a particle simulation method for hypersonic rarefied flow , 1988 .

[15]  Anoop Gupta,et al.  Memory-reference characteristics of multiprocessor applications under MACH , 1988, SIGMETRICS '88.