Increasing the resilience of atomic commit, at no additional cost

This paper presents a new atomic commitment protocol that always allows a quorum in the system to make progress. Previously suggested quorum-based protocols (e.g. [12]) allow a quorum to make progress in case of one failure. If failures cascade, however, and the quorum in the system is \lost" (i.e. at a given time no quorum component exists, e.g. because of a total crash), a quorum can later become connected and still remain blocked. The importance of this work is in demonstrating, using a simple algorithm, how protocols that always allow a majority to make progress can be constructed.

[1]  M. Herlihy A quorum-consensus replication method for abstract data types , 1986, TOCS.

[2]  Michael Stonebraker,et al.  A Formal Model of Crash Recovery in a Distributed System , 1983, IEEE Transactions on Software Engineering.

[3]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[4]  Flaviu Cristian,et al.  An efficient, fault-tolerant protocol for replicated data management , 1985, Fault-Tolerant Distributed Computing.

[5]  Jim Gray,et al.  Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[6]  Hector Garcia-Molina,et al.  Elections in a Distributed Computing System , 1982, IEEE Transactions on Computers.

[7]  David Peleg,et al.  The Availability of Quorum Systems , 1995, Inf. Comput..

[8]  Dale Skeen,et al.  A Quorum-Based Commit Protocol , 1982, Berkeley Workshop.

[9]  Maurice Herlihy Concurrency versus availability: atomicity mechanisms for replicated data , 1987, TOCS.

[10]  Satish K. Tripathi,et al.  A Fault-Tolerant Algorithm for Replicated Data Management , 1995, IEEE Trans. Parallel Distributed Syst..

[11]  Francis Y. L. Chin,et al.  Optimal termination protocols for network partitioning , 1983, PODS '83.

[12]  Amr El Abbadi,et al.  Maintaining availability in partitioned replicated databases , 1987, ACM Trans. Database Syst..

[13]  Tiko Kameda,et al.  Site optimal termination protocols for a distributed database under network partitioning , 1985, OPSR.

[14]  Idit Keidar,et al.  A Highly Available Paradigm for Consistent Object Replication , 1994 .

[15]  CheungDavid,et al.  Site optimal termination protocols for a distributed database under network partitioning , 1986 .