Adaptive granularity: Transparent integration of fine- and coarse-grain communication

The granularity of shared data is one of the key factors affecting the performance of distributed shared memory machines (DSM). Given that programs exhibit quite different sharing patterns, providing only one or two fixed granularities cannot result in an efficient use of resources. On the other hand, supporting arbitrarily granularity sizes significantly increases not only hardware complexity but software overhead as well. Furthermore, the efficient use of arbitrarily granularities put the burden on users to provide information about program behavior to compilers and/or runtime systems. These kind of requirements tend to restrict the programmability of the shared memory model. In this paper we present a new communication scheme, called adaptive granularity (AG). Adaptive granularity makes it possible to transparently integrate bulk transfer into the shared memory model by supporting variable-size granularity and memory replication. It consists of two protocols: one for small data and another for large data. For small size data, the standard hardware DSM protocol is used and the granularity is fixed to the size of a cache line. For large array data, the protocol for bulk data is used instead and the granularity varies depending on the sharing behavior of applications at runtime. Simulation results show that AG improves performance up to 43% over the hardware implementation of DSM (e.g., DASH, Alewife). Compared with an equivalent architecture that supports fine-grain memory replication at the fixed granularity of a cache line (e.g., Typhoon), AG reduces execution time up to 35%.

[1]  Jeffrey S. Chase,et al.  The Amber system: parallel programming on a network of multiprocessors , 1989, SOSP '89.

[2]  Paul Hudak,et al.  Memory coherence in shared virtual memory systems , 1989, TOCS.

[3]  Andrew P. Black,et al.  Fine-grained mobility in the Emerald system , 1987, TOCS.

[4]  Richard P. LaRowe,et al.  Hiding Shared Memory Reference Latency on the Galactica Net Distributed Shared Memory Architecture , 1992, J. Parallel Distributed Comput..

[5]  Daeyeon Park,et al.  Trojan: a high-performance simulator for shared memory architectures , 1996, Proceedings of the 29th Annual Simulation Symposium.

[6]  Harjinder S. Sandhu,et al.  The shared regions approach to software cache coherence on multiprocessors , 1993, PPOPP '93.

[7]  Thomas J. LeBlanc,et al.  Adjustable block size coherent caches , 1992, ISCA '92.

[8]  Andrew S. Tanenbaum,et al.  Modern Operating Systems , 1992 .

[9]  James R. Larus,et al.  Where is time spent in message-passing and shared-memory programs? , 1994, ASPLOS VI.

[10]  Brian N. Bershad,et al.  The Midway distributed shared memory system , 1993, Digest of Papers. Compcon Spring.

[11]  Anant Agarwal,et al.  Integrating message-passing and shared-memory: early experience , 1993, SIGP.

[12]  Anoop Gupta,et al.  Performance evaluation of hybrid hardware and software distributed shared memory protocols , 1994, ICS '94.

[13]  Anoop Gupta,et al.  Integration of message passing and shared memory in the Stanford FLASH multiprocessor , 1994, ASPLOS VI.

[14]  A. Gupta,et al.  The Stanford FLASH multiprocessor , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[15]  Paul Hudak,et al.  Memory coherence in shared virtual memory systems , 1986, PODC '86.

[16]  Michel Dubois,et al.  Synchronization, coherence, and event ordering in multiprocessors , 1988, Computer.

[17]  John L. Hennessy,et al.  The performance advantages of integrating block data transfer in cache-coherent multiprocessors , 1994, ASPLOS VI.

[18]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.

[19]  Alan L. Cox,et al.  Lazy release consistency for software distributed shared memory , 1992, ISCA '92.

[20]  Alan L. Cox,et al.  TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , 1994, USENIX Winter.

[21]  A. Agarwal,et al.  MGS: A Multigrain Shared Memory System , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[22]  Willy Zwaenepoel,et al.  Munin: distributed shared memory based on type-specific memory coherence , 1990, PPOPP '90.

[23]  Erik Hagersten,et al.  DDM - A Cache-Only Memory Architecture , 1992, Computer.