Lightweight hardware distributed shared memory supported by generalized combining

On a large scale parallel computer system, shared memory provides a general and convenient programming environment. The paper describes a lightweight method for constructing an efficient shared memory system supported by hierarchical coherence management and generalized combining. The hierarchical management technique and generalized combining cooperate with each other. We eliminate the following heavyweight and high cost factors: a large amount of directory memory which is proportional to the number of processors, a separate memory component for the directory, tag/state information, and a protocol processor. In our method, the amount of memory required for the directory is proportional to the logarithm of the number of processors. This implies that a single word for each memory block is sufficient for covering a massively parallel system and that the access costs of the directory are small. Moreover, our combining technique, generalized combining, does not expect the accidental events which existing combining networks do, that is, events that messages meet each other at a switching node. A switching node can combine succeeding messages with a preceding one even after the preceding message leaves the node. This can increase the rate of successful combining. We have developed a prototype parallel computer OCHANOMIZ-5, that implements this lightweight distributed shared memory and generalized combining with simple hardware. The results of evaluating the prototype's performance using several programs show that our methodology provides the advantages of parallelization.

[1]  Stein Gjessing,et al.  Distributed-directory scheme: scalable coherent interface , 1990, Computer.

[2]  Erik Hagersten,et al.  DDM - A Cache-Only Memory Architecture , 1992, Computer.

[3]  Paul Feautrier,et al.  A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[4]  Ricardo Bianchini,et al.  The MIT Alewife machine: architecture and performance , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[5]  Gregory F. Pfister,et al.  “Hot spot” contention and combining in multistage interconnection networks , 1985, IEEE Transactions on Computers.

[6]  A. Gottleib,et al.  The nyu ultracomputer- designing a mimd shared memory parallel computer , 1983 .

[7]  B. Delagi,et al.  Distributed-directory scheme: Stanford distributed-directory protocol , 1990, Computer.

[8]  Liviu Iftode,et al.  Improving release-consistent shared virtual memory using automatic update , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[9]  A. Gupta,et al.  The Stanford FLASH multiprocessor , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[10]  J. Larus,et al.  Tempest and Typhoon: user-level shared memory , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[11]  Anant Agarwal,et al.  LimitLESS directories: A scalable cache coherence scheme , 1991, ASPLOS IV.

[12]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[13]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[14]  Kourosh Gharachorloo,et al.  Shasta: a low overhead, software-only approach for supporting fine-grain shared memory , 1996, ASPLOS VII.

[15]  Kai Li,et al.  IVY: A Shared Virtual Memory System for Parallel Computing , 1988, ICPP.

[16]  Ralph Grishman,et al.  The NYU Ultracomputer—Designing an MIMD Shared Memory Parallel Computer , 1983, IEEE Transactions on Computers.

[17]  Mark Horowitz,et al.  An evaluation of directory schemes for cache coherence , 1998, ISCA '98.

[18]  Kei Hiraki,et al.  Distributed shared memory architecture for JUMP-1 a general-purpose MPP prototype , 1996, Proceedings Second International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'96).

[19]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.

[20]  Alan L. Cox,et al.  Lazy release consistency for software distributed shared memory , 1992, ISCA '92.