Performance of Pruning-Cache Directories for Large-Scale Multiprocessors

Multis, shared-memory multiprocessors that are implemented with single buses and snooping cache protocols are inherently limited to a small number of processors, and, as systems grow beyond a single bus, the bandwidth requirements of broadcast operations limit scalability. Hardware support to provide cache coherence without the use of broadcast can become very expensive. An approach to maintaining coherence using approximate information held in special-purpose caches called pruning-caches that provides robust performance over a wide range of workloads is presented. The pruning-cache approach is compared to the more conventional inclusion cache for providing multilevel inclusion (MLI) in the cache hierarchy. It is shown that pruning-caches are more cost-effective and more robust. Using both analysis and simulation, it is also shown that the k-ary n-cube topology provides scalable, bottleneck-free communication for uniform, point-to-point traffic. >

[1]  Nian-Feng Tzeng,et al.  Distributing Hot-Spot Addressing in Large-Scale Multiprocessors , 1987, IEEE Transactions on Computers.

[2]  Philip J. Woest,et al.  Scalability and Its Application to Multicube , 1989 .

[3]  Paul Feautrier,et al.  A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[4]  Anoop Gupta,et al.  Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes , 1990, ICPP.

[5]  Bell Cg,et al.  Multis: a new class of multiprocessor computers. , 1985 .

[6]  M. Hill,et al.  Weak ordering-a new definition , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[7]  John P. Hayes,et al.  Multiple Bus Architectures , 1987, Computer.

[8]  Ralph Grishman,et al.  The NYU Ultracomputer—Designing an MIMD Shared Memory Parallel Computer , 1983, IEEE Transactions on Computers.

[9]  Andrew W. Wilson,et al.  Hierarchical cache/bus architecture for shared memory multiprocessors , 1987, ISCA '87.

[10]  Kevin P. McAuliffe,et al.  The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture , 1985, ICPP.

[11]  Stuart E. Madnick,et al.  Propeties of storage hierarchy systems with multiple page sizes and redundant data , 1979, ACM Trans. Database Syst..

[12]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[13]  Michael L. Scott,et al.  Synchronization without contention , 1991, ASPLOS IV.

[14]  Mary K. Vernon,et al.  Efficient synchronization primitives for large-scale cache-coherent multiprocessors , 1989, ASPLOS 1989.

[15]  Philip J. Woest,et al.  The Wisconsin multicube: a new large-scale cache-coherent multiprocessor , 1988, ISCA '88.

[16]  Anant Agarwal,et al.  Limits on Interconnection Network Performance , 1991, IEEE Trans. Parallel Distributed Syst..

[17]  Mary K. Vernon,et al.  Efficient synchronization primitives for large-scale cache-coherent multiprocessors , 1989, ASPLOS III.

[18]  Trevor N. Mudge,et al.  Analysis of bus hierarchies for multiprocessors , 1988, ISCA '88.