A new scalable directory architecture for large-scale multiprocessors

The memory overhead introduced by directories constitutes a major hurdle in the scalability of cc-NUMA architectures, which makes the shared-memory paradigm unfeasible for very large-scale systems. This work is focused on improving the scalability of shared-memory multiprocessors by significantly reducing the size of the directory. We propose multilayer clustering as an effective approach to reduce the directory-entry width. Detailed evaluation for 64 processors shows that using this approach we can drastically reduce the memory overhead, while suffering a performance degradation we similar to previous compressed schemes (such as Coarse Vector). In addition, a novel two-level directory architecture is proposed in order to eliminate the penalty caused by these compressed directories. This organization consists of a small Full-Map first-level directory (which provides precise information for the most recently referenced lines) and a compressed second-level directory (which provides in-excess information). Results show that a system with this directory architecture can achieve the same performance as a multiprocessor with a big and non-scalable Full-Map directory with a very significant reduction of the memory overhead.

[1]  D. Lenoski,et al.  The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[2]  Calvin K. Tang Cache system design in the tightly coupled multiprocessor system , 1976, AFIPS '76.

[3]  Mark Horowitz Dynamic Pointer Allocation for Scalable Cache Coherence Directories , 1991 .

[4]  James R. Goodman Using cache memory to reduce processor-memory traffic , 1998, ISCA '98.

[5]  Ross Evan Johnson,et al.  Extending the scalable coherent interface for large-scale shared-memory multiprocessors , 1993 .

[6]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[7]  Laxmi N. Bhuyan,et al.  An efficient hybrid cache coherence protocol for shared memory multiprocessors , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[8]  David B. Gustavson The Scalable Coherent Interface and related standards projects , 1992, IEEE Micro.

[9]  Stefanos Kaxiras,et al.  Coherence communication prediction in shared-memory multiprocessors , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[10]  Paul Feautrier,et al.  A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[11]  Sarita V. Adve,et al.  RSIM Reference Manual: Version 1.0 , 1997 .

[12]  Jr. Richard Thomas Simoni,et al.  Cache coherence directories for scalable multiprocessors , 1992 .

[13]  Mark Horowitz,et al.  An evaluation of directory schemes for cache coherence , 1998, ISCA '98.

[14]  David A. Wood,et al.  Multicast snooping: a new coherence method using a multicast address network , 1999, ISCA.

[15]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[16]  Mark D. Hill,et al.  An evaluation of directory protocols for medium-scale shared-memory multiprocessors , 1994, ICS '94.

[17]  Babak Falsafi,et al.  Memory sharing predictor: the key to a speculative coherent DSM , 1999, ISCA.

[18]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[19]  Anant Agarwal,et al.  LimitLESS directories: A scalable cache coherence scheme , 1991, ASPLOS IV.

[20]  Anoop Gupta,et al.  Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes , 1990, ICPP.

[21]  Per Stenström,et al.  The Scalable Tree Protocol-a cache coherence approach for large-scale multiprocessors , 1992, [1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing.

[22]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[23]  A. Richard Newton,et al.  An empirical evaluation of two memory-efficient directory methods , 1990, ISCA '90.

[24]  Mark D. Hill,et al.  Using prediction to accelerate coherence protocols , 1998, ISCA.