Visualising Sharing Behaviour in relation to Shared Memory Management

Accesses to the shared memory remain to be a major performance limitation in shared memory multiprocessors. Scalable multiprocessors with distributed memory also poses the problem of keeping the memory coherent. A large number of shared memory coherence mechanisms has been proposed to solve this problem. Their relative performance is, however, determined by the sharing behaviour of the workloads. This paper presents a methodology to capture and visu-alise the sharing behaviour of a parallel program with respect to the choice coherence mechanisms. We identify four conceptual workload parameters: Spatial granularity, Degree of sharing, Access mode, and the Temporal Granu-larity. To demonstrate the effectiveness of the methodology, we have analysed the sharing behaviour of two parallel applications. The result is used to judge what shared memory coherence mechanism is most appropriate. The shared memory paradigm of programming parallel applications for multiprocessors and other environments has emerged as the preferred paradigm over others, such as the message passing model. Several small-scale, bus-based, shared memory multiprocessors are now commercially available and there are numerous research projects going on with large-scale shared memory multiprocessors, e.g. [1, 15]. In addition, there are also a number of ongoing projects with aim to put a shared memory environment on distributed memory machines or to use a network of workstations as a shared memory multicomputer [16] In order to make shared memory multiprocessors scala-ble, focus has recently been on distributed shared memory where the memory is distributed among the processing nodes. Figure 1 shows a general view of such an architecture. A processor, together with some memory form a processing node. A number of processing nodes are interconnected with an interconnection network. The local memory of a processing node is directly accessible by the processor. The contents in the memory of the other processing nodes is accessible either directly, or through some software mechanism. In both cases, at substantially higher cost than for the local memory. The distribution of memory across the processing nodes makes it very important to exploit the locality of reference of the parallel programs in order to minimise the average access time to shared memory. One approach to automate this has been to use memory coherence mechanisms so that shared variables may be replicated or automatically migrated between the processing nodes. Several memory coherence mechanisms have been proposed in the literature [8, 14, 20]. They encompass both cache coherence maintenance and virtual page level management. …

[1]  Anoop Gupta,et al.  Competitive management of distributed shared memory , 1989, Digest of Papers. COMPCON Spring 89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage.

[2]  Anant Agarwal,et al.  Directory-based cache coherence in large-scale multiprocessors , 1990, Computer.

[3]  Allen D. Malony,et al.  Experimentally Characterizing the Behavior of Multiprocessor Memory Systems. A Case Study , 1990, IEEE Trans. Software Eng..

[4]  Per Stenström,et al.  The Cachemire Test Bench A Flexible And Effective Approach For Simulation Of Multiprocessors , 1993, [1993] Proceedings 26th Annual Simulation Symposium.

[5]  Anoop Gupta,et al.  Memory-reference characteristics of multiprocessor applications under MACH , 1988, SIGMETRICS '88.

[6]  P. Stenstrom A survey of cache coherence schemes for multiprocessors , 1990, Computer.

[7]  Michel Dubois,et al.  Dynamic page migration in multiprocessors with distributed global memory , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[8]  Josep Torrellas,et al.  Share Data Placement Optimizations to Reduce Multiprocessor Cache Miss Rates , 1990, ICPP.

[9]  Mark Horowitz,et al.  An evaluation of directory schemes for cache coherence , 1998, ISCA '98.

[10]  Anoop Gupta,et al.  The Stanford Dash multiprocessor , 1992, Computer.

[11]  Willy Zwaenepoel,et al.  Adaptive software cache management for distributed shared memory architectures , 1990, ISCA '90.

[12]  Frederica Darema,et al.  A single-program-multiple-data computational model for EPEX/FORTRAN , 1988, Parallel Comput..

[13]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[14]  Frederica Darema,et al.  Memory access patterns of parallel scientific programs , 1987, SIGMETRICS '87.

[15]  Anoop Gupta,et al.  Analysis of cache invalidation patterns in multiprocessors , 1989, ASPLOS III.

[16]  Bill Nitzberg,et al.  Distributed shared memory: a survey of issues and algorithms , 1991, Computer.

[17]  Carla Schlatter Ellis,et al.  Experimental comparison of memory management policies for NUMA multiprocessors , 1991, TOCS.

[18]  James H. Patterson,et al.  Portable Programs for Parallel Processors , 1987 .

[19]  Anoop Gupta,et al.  Memory-reference characteristics of multiprocessor applications under MACH , 1988, SIGMETRICS 1988.

[20]  Anoop Gupta,et al.  Comparative performance evaluation of cache-coherent NUMA and COMA architectures , 1992, ISCA '92.

[21]  Donald Yeung,et al.  THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR , 1991 .

[22]  Sandra Johnson Baylor,et al.  A Study of the Memory Reference Behavior of Engineering/Scientific Applications in Parallel Processors , 1989, ICPP.