The Effect of Contention on the Scalability of Page-Based Software Shared Memory Systems

In this paper, we examine the causes and effects of contention for shared data access in parallel programs running on a software distributed shared memory (DSM) system. Specifically, we experiment on two widely-used, pagebased protocols, Princeton’s home-based lazy release consistency (HLRC) and TreadMarks. For most of our programs, these protocols were equally affected by latency increases caused by contention and achieved similar performance. Where they differ significantly, HLRC’s ability to manually eliminate load imbalance was the largest factor accounting for the difference. Finally, to quantify the effects of contention we either modified the application to eliminate the cause of the contention or modified the underlying protocol to efficiently handle it. Overall, we find that contention has profound effects on performance: eliminating contention reduced execution time by 64% in the most extreme case, even at the relatively modest scale of 32 nodes that we consider in this paper.

[1]  Ricardo Bianchini,et al.  Efficiently adapting to sharing patterns in software DSMs , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[2]  A A Schäffer,et al.  Faster sequential genetic linkage computations. , 1993, American journal of human genetics.

[3]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[4]  Alan L. Cox,et al.  Lazy release consistency for software distributed shared memory , 1992, ISCA '92.

[5]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[6]  Henri E. Bal,et al.  Performance evaluation of the Orca shared-object system , 1998, TOCS.

[7]  K. Gharachorloo Et El,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990 .

[8]  Kourosh Gharachorloo,et al.  Shasta: a low overhead, software-only approach for supporting fine-grain shared memory , 1996, ASPLOS VII.

[9]  Srinivasan Parthasarathy,et al.  Cashmere-2L: software coherent shared memory on a clustered remote-write network , 1997, SOSP.

[10]  J. Ott,et al.  Strategies for multilocus linkage analysis in humans. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Alan L. Cox,et al.  TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , 1994, USENIX Winter.

[12]  A A Schäffer,et al.  Integrating parallelization strategies for linkage analysis. , 1995, Computers and biomedical research, an international journal.

[13]  Liviu Iftode,et al.  Home-based SVM protocols for SMP clusters: Design and performance , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[14]  A A Schäffer,et al.  Parallelization of general-linkage analysis problems. , 1994, Human heredity.

[15]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[16]  Alan L. Cox,et al.  Software DSM protocols that adapt between single writer and multiple writer , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[17]  Liviu Iftode,et al.  Performance evaluation of two home-based lazy release consistency protocols for shared virtual memory systems , 1996, OSDI '96.

[18]  Alan L. Cox,et al.  An Evaluation of Software-Based Release Consistent Protocols , 1995, J. Parallel Distributed Comput..

[19]  Mark D. Hill,et al.  A Unified Formalization of Four Shared-Memory Models , 1993, IEEE Trans. Parallel Distributed Syst..

[20]  Paul Hudak,et al.  Memory coherence in shared virtual memory systems , 1989, TOCS.