Nonblocking and orphan-free message logging protocols

Currently existing message logging protocols demonstrate a classic pessimistic vs. optimistic tradeoff. It is shown that the optimistic-pessimistic tradeoff is not inherent to the problem of message logging. The authors construct a message-logging protocol that has the positive features of both optimistic and pessimistic protocols: the protocol prevents orphans and allows simple failure recovery; however, it requires no blocking in failure-free runs. Furthermore, this protocol does not introduce any additional message overhead as compared to one implemented for a system in which messages may be lost but processes do not crash.

[1]  A. Prasad Sistla,et al.  Efficient distributed recovery using message logging , 1989, PODC '89.

[2]  David L. Presotto,et al.  Publishing: a reliable broadcast communication mechanism , 1983, SOSP '83.

[3]  Willy Zwaenepoel,et al.  Manetho: Transparent Rollback-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit , 1992, IEEE Trans. Computers.

[4]  David B. Johnsonandwillyzwaenepoel Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing , 1990 .

[5]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[6]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[7]  David B. Johnson,et al.  Distributed system fault tolerance using message logging and checkpointing , 1990 .

[8]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[9]  Lorenzo Alvisi,et al.  Nonblocking and orphan-free message logging protocols , 1992, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[10]  Flaviu Cristian,et al.  Fault-tolerance in the advanced automation system , 1990, EW 4.

[11]  Anita Borg,et al.  A message system supporting fault tolerance , 1983, SOSP '83.

[12]  David B. Johnson,et al.  Sender-Based Message Logging , 1987 .

[13]  David B. Johnson,et al.  Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing , 1988, J. Algorithms.

[14]  Fred B. Schneider,et al.  Byzantine generals in action: implementing fail-stop processors , 1984, TOCS.

[15]  Fred B. Schneider,et al.  Primary-Backup Protocols: Lower Bounds and Optimal Implementations , 1992 .

[16]  D. McCue,et al.  Fault-Tolerance in the Advanced Automation System , 1991, OPSR.

[17]  David F. Bacon,et al.  Volatile logging in n-fault-tolerant distributed systems , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[18]  LamportLeslie Time, clocks, and the ordering of events in a distributed system , 1978 .

[19]  Robert E. Strom,et al.  Optimistic recovery in distributed systems , 1985, TOCS.