Replicated transactions

A scheme to replicate transactions is described. The scheme allows a k-replicated transaction to survive (k-1) failures. No coordination among the k replicas is needed until one of them reaches the end and proceeds to abort the others. Consequently, the scheme avoids the overhead and delay caused by failure detection reconfiguration, and synchronization during the k replicas' execution. A robust commit protocol to choose the transaction replica that should be committed and a procedure to choose the nodes on which a transaction replica is executed are described. The goal of the procedure is to maximize reliability.<<ETX>>

[1]  Vassos Hadzilacos An algorithm for minimizing roll back cost , 1982, PODS '82.

[2]  Fred B. Schneider,et al.  Byzantine generals in action: implementing fail-stop processors , 1984, TOCS.

[3]  Eric C. Cooper Replicated distributed programs , 1985, SOSP 1985.

[4]  Pui Ng,et al.  A commit protocol for checkpointing transactions , 1988, Proceedings [1988] Seventh Symposium on Reliable Distributed Systems.

[5]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[6]  K. Mani Chandy,et al.  A distributed algorithm for detecting resource deadlocks in distributed systems , 1982, PODC '82.

[7]  Kenneth P. Birman Replication and fault-tolerance in the ISIS system , 1985, SOSP 1985.

[8]  Philip A. Bernstein,et al.  Concurrency Control in Distributed Database Systems , 1986, CSUR.

[9]  RICHARD KOO,et al.  Checkpointing and Rollback-Recovery for Distributed Systems , 1986, IEEE Transactions on Software Engineering.

[10]  Partha Dasgupta,et al.  Fault Tolerant Computing in Object Based Distributed Operating Systems , 1987, SRDS.

[11]  Ron Obermarck,et al.  Distributed deadlock detection algorithm , 1982, TODS.

[12]  Jim Gray,et al.  Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[13]  Joel F. Bartlett,et al.  A NonStop kernel , 1981, SOSP.

[14]  Liba Svobodova Resilient Distributed Computing , 1984, IEEE Transactions on Software Engineering.

[15]  J. Eliot B. Moss,et al.  Checkpoint and Restart in Distributed Transaction Systems , 1983, Symposium on Reliability in Distributed Software and Database Systems.