Evaluating the performance of software cache coherence

In a shared-memory multiprocessor with private caches, cached copies of a data item must be kept consistent. This is called cache coherence. Both hardware and software coherence schemes have been proposed. Software techniques are attractive because they avoid hardware complexity and can be used with any processor-memory interconnection. This paper presents an analytical model of the performance of two software coherence schemes and, for comparison, snoopy-cache hardware. The model is validated against address traces from a bus-based multiprocessor. The behavior of the coherence schemes under various workloads is compared, and their sensitivity to variations in workload parameters is assessed. The analysis shows that the performance of software schemes is critically determined by certain parameters of the workload: the proportion of data accesses, the fraction of shared references, and the number of times a shared block is accessed before it is purged from the cache. Snoopy caches are more resilient to variations in these parameters. Thus when evaluating a software scheme as a design alternative, it is essential to consider the characteristics of the expected workload. The performance of the two software schemes with a multistage interconnection network is also evaluated, and it is determined that both scale well.

[1]  Paul Feautrier,et al.  A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[2]  Robert Olson,et al.  Parallelizing Large Existing Programs: Methodology and Experiences , 1986, COMPCON.

[3]  Mary K. Vernon,et al.  Performance analysis of multiprocessor cache consistency protocols using generalized timed Petri nets , 1986, SIGMETRICS '86/PERFORMANCE '86.

[4]  Robert Olson Parallel Processing in a Message-Based operating System , 1985, IEEE Software.

[5]  David R. Cheriton,et al.  Software-controlled caches in the VMP multiprocessor , 1986, ISCA 1986.

[6]  Alexander V. Veidenbaum,et al.  A cache coherence scheme with fast selective invalidation , 1988, ISCA '88.

[7]  S. J. Frank,et al.  Tightly coupled multiprocessor system speeds memory-access times , 1984 .

[8]  Mary K. Vernon,et al.  An accurate and efficient performance analysis technique for multiprocessor snooping cache-consistency protocols , 1988, ISCA '88.

[9]  Marc Snir,et al.  The Performance of Multistage Interconnection Networks for Multiprocessors , 1983, IEEE Transactions on Computers.

[10]  Anoop Gupta,et al.  Memory-reference characteristics of multiprocessor applications under MACH , 1988, SIGMETRICS 1988.

[11]  Alan Jay Smith,et al.  Cache Memories , 1982, CSUR.

[12]  Kevin P. McAuliffe,et al.  RP3 Processor-Memory Element , 1985, ICPP.

[13]  Edward D. Lazowska,et al.  Quantitative System Performance , 1985, Int. CMG Conference.

[14]  Anant Agarwal,et al.  Multiprocessor cache analysis using ATUM , 1988, ISCA '88.

[15]  Kevin P. McAuliffe,et al.  Automatic Management of Programmable Caches , 1988, ICPP.

[16]  Janak H. Patel Analysis of Multiprocessors with Private Cache Memories , 1982, IEEE Transactions on Computers.

[17]  Duncan H. Lawrie,et al.  Access and Alignment of Data in an Array Processor , 1975, IEEE Transactions on Computers.

[18]  Edward D. Lazowska,et al.  Quantitative system performance - computer system analysis using queueing network models , 1983, Int. CMG Conference.

[19]  Janak H. Patel Performance of Processor-Memory Interconnections for Multiprocessors , 1981, IEEE Transactions on Computers.

[20]  Larry Rudolph,et al.  Issues related to MIMD shared-memory computers: the NYU ultracomputer approach , 1985, ISCA '85.

[21]  James K. Archibald,et al.  Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.

[22]  Albert G. Greenberg,et al.  Analysis of Snooping Caches , 1987, Performance.

[23]  Lawrence C. Stewart,et al.  Firefly: a multiprocessor workstation , 1987, ASPLOS 1987.

[24]  Samuel H. Fuller,et al.  The C.mmp Multiprocessor , 1978 .

[25]  Randy H. Katz,et al.  Implementing a cache consistency protocol , 1985, ISCA 1985.