Evaluating the performance of software cache coherence

In a shared-memory multiprocessor with private caches, cached copies of a data item must be kept consistent. This is called cache coherence. Both hardware and software coherence schemes have been proposed. Software techniques are attractive because they avoid hardware complexity and can be used with any processor-memory interconnection. This paper presents an analytical model of the performance of two software coherence schemes and, for comparison, snoopy-cache hardware. The model is validated against address traces from a bus-based multiprocessor. The behavior of the coherence schemes under various workloads is compared, and their sensitivity to variations in workload parameters is assessed. The analysis shows that the performance of software schemes is critically determined by certain parameters of the workload: the proportion of data accesses, the fraction of shared references, and the number of times a shared block is accessed before it is purged from the cache. Snoopy caches are more resilient to variations in these parameters. Thus when evaluating a software scheme as a design alternative, it is essential to consider the characteristics of the expected workload. The performance of the two software schemes with a multistage interconnection network is also evaluated, and it is determined that both scale well.

[1]  S. J. Frank,et al.  Tightly coupled multiprocessor system speeds memory-access times , 1984 .

[2]  Janak H. Patel Performance of Processor-Memory Interconnections for Multiprocessors , 1981, IEEE Transactions on Computers.

[3]  Mary K. Vernon,et al.  Performance analysis of multiprocessor cache consistency protocols using generalized timed Petri nets , 1986, SIGMETRICS '86/PERFORMANCE '86.

[4]  Anoop Gupta,et al.  Memory-reference characteristics of multiprocessor applications under MACH , 1988, SIGMETRICS '88.

[5]  Randy H. Katz,et al.  Implementing a cache consistency protocol , 1985, ISCA '85.

[6]  Robert Olson Parallel Processing in a Message-Based operating System , 1985, IEEE Software.

[7]  Samuel H. Fuller,et al.  The C.mmp Multiprocessor , 1978 .

[8]  H. Cheong,et al.  A cache coherence scheme with fast selective invalidation , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[9]  Robert Olson,et al.  Parallelizing Large Existing Programs: Methodology and Experiences , 1986, COMPCON.

[10]  Lawrence C. Stewart,et al.  Firefly: a multiprocessor workstation , 1987, IEEE Trans. Computers.

[11]  James K. Archibald,et al.  Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.

[12]  Kevin P. McAuliffe,et al.  Automatic Management of Programmable Caches , 1988, ICPP.

[13]  Duncan H. Lawrie,et al.  Access and Alignment of Data in an Array Processor , 1975, IEEE Transactions on Computers.

[14]  David R. Cheriton,et al.  Software-Controlled Caches in the VMP Multiprocessor , 1986, ISCA.

[15]  Kevin P. McAuliffe,et al.  RP3 Processor-Memory Element , 1985, ICPP.

[16]  Larry Rudolph,et al.  Issues Related to MIMD Shared-memory Computers: The NYU Ultracomputer Approach , 1985, ISCA.

[17]  Kevin P. McAuliffe,et al.  The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture , 1985, ICPP.

[18]  Marc Snir,et al.  The Performance of Multistage Interconnection Networks for Multiprocessors , 1983, IEEE Transactions on Computers.

[19]  Albert G. Greenberg,et al.  Analysis of Snooping Caches , 1987, Performance.

[20]  Mary K. Vernon,et al.  An accurate and efficient performance analysis technique for multiprocessor snooping cache-consistency protocols , 1988, ISCA '88.

[21]  Paul Feautrier,et al.  A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[22]  Anant Agarwal,et al.  Multiprocessor cache analysis using ATUM , 1988, ISCA '88.

[23]  Janak H. Patel Analysis of Multiprocessors with Private Cache Memories , 1982, IEEE Transactions on Computers.