Segment directory enhancing the limited directory cache coherence schemes

We present a new arrangement of directory bits called the segment directory to improve directory storage efficiency: a segment directory can point to several sharing processors with almost the same number of bits as the pointer which can point to only one. Many directory overflows can be eliminated by using the segment directory element in place of the pointer in the limited directory schemes. Also, the segment directory can be implemented without introducing additional hardware overhead and protocol complexity. The detailed execution-driven simulations show that the segment directory always does better than the pointer and eliminates many directory overflows by up to 85%. The resulting improvement in bandwidth and execution time is analyzed in detail for limited directory schemes having different behaviours, with respect to the reduced directory overflows.

[1]  Michel Dubois,et al.  Correct memory operation of cache-based multiprocessors , 1987, ISCA '87.

[2]  D. Lenoski,et al.  The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[3]  Daniel E. Lenoski,et al.  Scalable Shared-Memory Multiprocessing , 1995 .

[4]  Jr. Richard Thomas Simoni,et al.  Cache coherence directories for scalable multiprocessors , 1992 .

[5]  Paul Feautrier,et al.  A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[6]  A. Gupta,et al.  The Stanford FLASH multiprocessor , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[7]  Tom Lovett,et al.  STiNG: A CC-NUMA Computer System for the Commercial Marketplace , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[8]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[9]  Kyu Ho Park,et al.  A New Efficient Arrangement of Directory Bits for Cache Coherence , 1998, ISCA 1998.

[10]  Maged M. Michael,et al.  Coherence controller architectures for SMP-based CC-NUMA multiprocessors , 1997, ISCA '97.

[11]  Ricardo Bianchini,et al.  The MIT Alewife machine: architecture and performance , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[12]  Pat Helland,et al.  The Mercury Interconnect Architecture: A Cost-effective Infrastructure For High-performance Servers , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[13]  Anant Agarwal,et al.  LimitLESS directories: A scalable cache coherence scheme , 1991, ASPLOS IV.

[14]  Anant Agarwal,et al.  LimitLESS directories: A scalable cache coherence scheme , 1991, ASPLOS IV.

[15]  James K. Archibald,et al.  Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.

[16]  Stephen R. Goldschmidt,et al.  Simulation of multiprocessors: accuracy and performance , 1993 .

[17]  Mark Horowitz,et al.  An evaluation of directory schemes for cache coherence , 1998, ISCA '98.

[18]  Anant Agarwal,et al.  Limits on Interconnection Network Performance , 1991, IEEE Trans. Parallel Distributed Syst..

[19]  Hugh Garraway Parallel Computer Architecture: A Hardware/Software Approach , 1999, IEEE Concurrency.

[20]  Anoop Gupta,et al.  Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes , 1990, ICPP.

[21]  Stein Gjessing,et al.  Distributed-directory scheme: scalable coherent interface , 1990, Computer.

[22]  A. Richard Newton,et al.  An empirical evaluation of two memory-efficient directory methods , 1990, ISCA '90.