论文信息 - Automatic reconfiguration in the presence of failures

Automatic reconfiguration in the presence of failures

The paper describes a new kind of distributed system service, the availability management service, responsible for ensuring that the critical services of a distributed system remain continuously available to users despite arbitrary numbers of concurrent node removals and node restarts caused by failures, maintenance, and growth. It stresses the main ideas behind this new service, and outlines a simple design that depends on the existence of synchronous membership and atomic broadcast group communication services. Extensions of this initial design to deal with asynchronous group communication services are also briefly discussed. >

Flaviu Cristian

[1] Flaviu Cristian,et al. Fault-tolerance in the advanced automation system , 1990, EW 4.

[2] Fred B. Schneider,et al. Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[3] Özalp Babaoglu,et al. Streets of Byzantium: Network Architectures for Fast Reliable Broadcasts , 1985, IEEE Transactions on Software Engineering.

[4] P. M. Melliar-Smith,et al. Synchronizing clocks in the presence of faults , 1985, JACM.

[5] Paul D. Ezhilchelvan,et al. Principal Features of the VOLTAN Family of Reliable Node Architectures for Distributed Systems , 1992, IEEE Trans. Computers.

[6] Henri E. Bal,et al. An efficient reliable broadcast protocol , 1989, OPSR.

[7] Jo-Mei Chang,et al. Reliable broadcast protocols , 1984, TOCS.

[8] David Lorge Parnas,et al. A technique for software module specification with examples , 1972, CACM.

[9] Flaviu Cristian,et al. A Rigorous Approach to Fault-Tolerant Programming , 1985, IEEE Transactions on Software Engineering.

[10] Flaviu Cristian,et al. Understanding fault-tolerant distributed systems , 1991, CACM.

[11] Nancy A. Lynch,et al. A New Fault-Tolerance Algorithm for Clock Synchronization , 1988, Inf. Comput..

[12] Hermann Kopetz,et al. Clock Synchronization in Distributed Real-Time Systems , 1987, IEEE Transactions on Computers.

[13] Liuba Shrira,et al. Lazy replication: exploiting the semantics of distributed services (extended abstract) , 1990, OPSR.