TEMPORAL, PROCESSOR, AND SPATIAL LOCALITY IN MULTIPROCESSOR MEMORY REFERENCES

The performance of cache-coherent multiprocessors is strongly influenced by locality in the memory reference behavior of parallel applications. While the notions of temporal and spatial locality in uniprocessor memory references are well understood, the corresponding notions of locality in multiprocessors and their impact on multiprocessor cache behavior are not clear. A locality model suitable for multiprocessor cache evaluation is derived by viewing memory references as streams of processor identifiers directed at specific cache/memory blocks. This viewpoint differs from the traditional uniprocessor approach that uses streams of addresses to different blocks emanating from specific processors. Our view is based on the intuition that cache coherence traffic in multiprocessors is largely determined by the number of processors accessing a location, the frequency with which they access the location, and the sequence in which their accesses occur. The specific locations accessed by each processor, the time order of access to different locations, and the size of the working set play a smaller role in determining the cache coherence traffic, although they still influence intrinsic cache performance. Looking at traces from the viewpoint of a memory block leads to a new notion of reference locality for multiprocessors, called processor locality. In this paper, we study the temporal, spatial, and processor locality in the memory reference patterns of theree parallel applications. Based on the observed locality, we then reflect on the expected cache behavior of the three applications.

[1]  Allen Newell,et al.  Parallel algorithms and architectures for rule-based systems , 1986, ISCA '86.

[2]  Randy H. Katz,et al.  Implementing a cache consistency protocol , 1985, ISCA 1985.

[3]  Mark Horowitz,et al.  ATUM: a new technique for capturing address traces using microcode , 1986, ISCA '86.

[4]  Anoop Gupta,et al.  Memory-reference characteristics of multiprocessor applications under MACH , 1988, SIGMETRICS 1988.

[5]  James K. Archibald,et al.  Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.

[6]  Frederica Darema,et al.  Memory access patterns of parallel scientific programs , 1987, SIGMETRICS '87.

[7]  A. Gupta,et al.  Parallel distributed-time logic simulation , 1989, IEEE Design & Test of Computers.

[8]  Abhinav Gupta,et al.  Analysis of cache invalidation patterns in multiprocessors , 1989, ASPLOS 1989.

[9]  R. H. Katz,et al.  Evaluating the performance of four snooping cache coherency protocols , 1989, ISCA '89.

[10]  Lawrence C. Stewart,et al.  Firefly: a multiprocessor workstation , 1987, ASPLOS 1987.

[11]  Paul Feautrier,et al.  A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[12]  Anant Agarwal,et al.  Multiprocessor cache analysis using ATUM , 1988, ISCA '88.

[13]  Jeffrey R. Spirn,et al.  Program Behavior: Models and Measurements , 1977 .

[14]  Peter J. Denning,et al.  The working set model for program behavior , 1968, CACM.

[15]  Jonathan Rose LocusRoute: a parallel global router for standard cells , 1988, 25th ACM/IEEE, Design Automation Conference.Proceedings 1988..

[16]  Larry Rudolph,et al.  Dynamic decentralized cache schemes for mimd parallel processors , 1984, ISCA 1984.

[17]  Anna R. Karlin,et al.  Competitive snoopy caching , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).