Tradeoffs between false sharing and aggregation in software distributed shared memory

Software Distributed Shared Memory (DSM) systems based on virtual memory techniques traditionally use the hardware page as the consistency unit. The large size of the hardware page is considered to be a performance bottleneck because of the implied false sharing overheads. Instead, we show that in the presence of a relaxed consistency model and a multiple writer protocol, a large consistency unit is generally not detrimental to performance. We study the tradeoffs between false sharing and aggregation effects when using large consistency units. In this context, this paper makes three separate contributions:1. We document the cost of false sharing in terms of extra messages and extra data being communicated. We find that, for the applications considered, when the virtual memory page is used as the consistency unit, the number of extra messages is small, while the amount of extra data can be substantial.2. We evaluate the performance when the consistency unit is increased to a multiple of the virtual memory page size. For most applications and data sets, the performance improves, except when the false sharing effects include extra messages or a large amount of extra data.3. We present a new algorithm for dynamically aggregating pages. In our algorithm, the aggregated pages do not necessarily need to be contiguous. In all cases, the performance of our dynamic aggregation algorithm is similar to that achieved with the best static page size.These results were obtained by measuring the performance of eight applications on the TreadMarks distributed shared memory system. The hardware platform used is a network of 166Mhz Pentiums connected by a switched 100Mbps Ethernet network.

[1]  Harry A. G. Wijshoff,et al.  Managing pages in shared virtual memory systems: getting the compiler into the game , 1993, ICS '93.

[2]  Abhinav Gupta,et al.  Analysis of cache invalidation patterns in multiprocessors , 1989, ASPLOS 1989.

[3]  Al Geist,et al.  Network-based concurrent computing on the PVM system , 1992, Concurr. Pract. Exp..

[4]  James R. Goodman Coherency for multiprocessor virtual address caches , 1987, ASPLOS 1987.

[5]  Randy H. Katz,et al.  The effect of sharing on the cache and bus performance of parallel programs , 1989, ASPLOS III.

[6]  Susan J. Eggers,et al.  Reducing false sharing on shared memory multiprocessors through compile time data transformations , 1995, PPOPP '95.

[7]  Thomas J. LeBlanc,et al.  Adjustable block size coherent caches , 1992, ISCA '92.

[8]  R. Sadourny The Dynamics of Finite-Difference Models of the Shallow-Water Equations , 1975 .

[9]  Alan L. Cox,et al.  Lazy release consistency for software distributed shared memory , 1992, ISCA '92.

[10]  Alan L. Cox,et al.  TreadMarks: shared memory computing on networks of workstations , 1996 .

[11]  Paul Hudak,et al.  Memory coherence in shared virtual memory systems , 1989, TOCS.

[12]  Anoop Gupta,et al.  Analysis of cache invalidation patterns in multiprocessors , 1989, ASPLOS III.

[13]  Willy Zwaenepoel,et al.  Techniques for reducing consistency-related communication in distributed shared-memory systems , 1995, TOCS.

[14]  Alan L. Cox,et al.  Quantifying the Performance Differences between PVM and TreadMarks , 1997, J. Parallel Distributed Comput..

[15]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[16]  Brian N. Bershad,et al.  Software write detection for a distributed shared memory , 1994, OSDI '94.

[17]  Liviu Iftode,et al.  Relaxed consistency and coherence granularity in DSM systems: a performance evaluation , 1997, PPOPP '97.

[18]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, ISCA '90.

[19]  A A Schäffer,et al.  Parallelization of general-linkage analysis problems. , 1994, Human heredity.