Toward a Comprehensive Software Based Dsm System

Software based Distributed Shared Memory (DSM) systems have been the focus of considerable research effort, primarily in improving performance and consistency protocols. Unfortunately, computer clusters present a number of challenges for any DSM systems that are not solvable through consistency protocols alone. These challenges relate to the ability of DSM systems to adjust to load fluctuations, computers being added/removed from the cluster, to deal with faults, and the ability to use DSM objects larger than the available physical memory. We present here a proposal for the Synergy Distributed Shared Memory System and its integration with the virtual memory, group communication and process migration services of the Genesis Cluster Operating System.

[1]  Liviu Iftode,et al.  Shared virtual memory: progress and challenges , 1999 .

[2]  James S. Plank,et al.  Design, implementation, and performance of checkpointing in NetSolve , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[3]  Qun Li,et al.  BFXM: a parallel file system model based on the mechanism of distributed shared memory , 1997, OPSR.

[4]  Evangelos P. Markatos,et al.  Implementation of a Reliable Remote Memory Pager , 1996, USENIX ATC.

[5]  Nikolaos Hardavellas,et al.  Cashmere-VLM: Remote memory paging for software distributed shared memory , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[6]  Andrzej M. Goscinski,et al.  GENESIS: an efficient, transparent and easy to use cluster operating system , 2002, Parallel Comput..

[7]  Henri E. Bal,et al.  Orca: A Language For Parallel Programming of Distributed Systems , 1992, IEEE Trans. Software Eng..

[8]  Peter J. Keleher,et al.  Thread migration and load balancing in non-dedicated environments , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[9]  Sotiris Ioannidis,et al.  On using network RAM as a non‐volatile buffer , 1999, Cluster Computing.

[10]  Sotiris Ioannidis,et al.  Compiler and Run-Time Support for Adaptive Load Balancing in Software Distributed Shared Memory Systems , 1998, LCR.

[11]  Willy Zwaenepoel,et al.  Techniques for reducing consistency-related communication in distributed shared-memory systems , 1995, TOCS.

[12]  Weisong Shi,et al.  Dynamic task migration in home-based software DSM systems , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[13]  Alan L. Cox,et al.  TreadMarks: shared memory computing on networks of workstations , 1996 .

[14]  Gregor von Laszewski,et al.  A fault detection service for wide area distributed computations , 2004, Cluster Computing.

[15]  Peter J. Keleher,et al.  The relative importance of concurrent writers and weak consistency models , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[16]  Anne-Marie Kermarrec,et al.  A two-level checkpoint algorithm in a highly-available parallel single level store system , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[17]  Andrzej M. Goscinski,et al.  The RHODOS DSM system , 1998, Microprocess. Microsystems.

[18]  Wilson C. Hsieh,et al.  Dynamic computation migration in distributed shared memory systems , 1996 .