论文信息 - Restart services for highly available systems

Restart services for highly available systems

This paper proposes a design methodology for building highly available systems. In addition, we describe a set of operating system services that can be used to achieve this goal. The techniques described are intended for a parallel environment and can be generalized for any distributed system. We describe a methodology for providing basic services for high availability, specific services for restart and an implementation of these services.

[1] Danny Dolev,et al. Highly available cluster: a case study , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[2] Dhiraj K. Pradhan,et al. Processor- and memory-based checkpoint and rollback recovery , 1993, Computer.

[3] Daniel P. Siewiorek. Fault tolerance in commercial computers , 1990, Computer.

[4] Hector Garcia-Molina,et al. Elections in a Distributed Computing System , 1982, IEEE Transactions on Computers.

[5] Farnam Jahanian,et al. Strong, weak and hybrid group membership , 1992, [1992 Proceedings] Second Workshop on the Management of Replicated Data.

[6] Victor P. Nelson. Fault-tolerant computing: fundamental concepts , 1990, Computer.

[7] Prithviraj Banerjee,et al. Design and analysis of software reconfiguration strategies for hypercube multicomputers under multiple faults , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[8] Flaviu Cristian,et al. Agreeing on who is present and who is absent in a synchronous distributed system , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.