Experiences with Oasis+: a fault tolerant storage system

The Oasis+ distributed storage system is a reliable memory store for small scale computing clusters. It is implemented entirely in-memory using Distributed Shared Memory (DSM) and was built to operate as a backbone service for a computing cluster that supports mobile workstations or remote clients needing fast access to storage. The system can store data quickly in a dependable manner in part by using a highperformance, high-availability page-based protocol called BR [1]. BR guarantees robust functionality despite multiple site failures that could occur. By integrating address range locking and eager release consistency (ERC) [2] Oasis+ provides a flexible and efficient platform for the development of distributed services. Reliability is achieved by replication and corrective cleanup recovery actions once failures arise.

[1]  Mukesh Singhal,et al.  Advanced Concepts In Operating Systems , 1994 .

[2]  Brett D. Fleisch,et al.  DBRpc: a highly adaptable protocol for reliable DSM systems , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[3]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[4]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.

[5]  Brett D. Fleisch,et al.  The Boundary-Restricted Coherence Protocol for Scalable and Highly Available Distributed Shared Memory Systems , 1996, Comput. J..

[6]  A. Knaff,et al.  Reliable support for a persistent distributed shared memory , 1997, Proceedings of 17th International Conference on Distributed Computing Systems.

[7]  Brett D. Fleisch,et al.  A Dynamic Coherence Protocol for Distributed Shared Memory Enforcing High Data Availability at Low Costs , 1996, IEEE Trans. Parallel Distributed Syst..

[8]  Rajkumar Buyya,et al.  2001 IEEE International Conference on Cluster Computing , 2001 .

[9]  Leigh Stoller,et al.  Making distributed shared memory simple, yet efficient , 1998, Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments.