Partial replication in the Database State Machine

This paper investigates the use of partial replication in the Database State Machine approach introduced earlier for fully replicated databases. It builds on the order and atomicity properties of group communication primitives to achieve strong consistency and proposes two new abstractions: Resilient Atomic Commit and Fast Atomic Broadcast. Even with atomic broadcast, partial replication requires a termination protocol such as atomic commit to ensure transaction atomicity, With Resilient Atomic Commit our termination protocol allows the commit of a transaction despite the failure of some of the participants. Preliminary performance studies suggest that the additional cost of supporting partial replication can be mitigated through the use of Fast Atomic Broadcast.

[1]  Maurice Herlihy,et al.  Dynamic quorum adjustment for partitioned data , 1987, TODS.

[2]  Yair Amir,et al.  Membership Algorithms for Multicast Communication Groups , 1992, WDAG.

[3]  Rachid Guerraoui,et al.  Total order multicast to multiple groups , 1997, Proceedings of 17th International Conference on Distributed Computing Systems.

[4]  Miron Livny,et al.  Concurrency control performance modeling: alternatives and implications , 1987, TODS.

[5]  Rachid Guerraoui,et al.  Exploiting Atomic Broadcast in Replicated Databases , 1998, Euro-Par.

[6]  Mostafa H. Ammar,et al.  The Grid Protocol: A High Performance Scheme for Maintaining Replicated Data , 1992, IEEE Trans. Knowl. Data Eng..

[7]  André Schiper,et al.  Generic Broadcast , 1999, DISC.

[8]  José Pereira,et al.  Rewriting “ The Turtle and the Hare ” : Sleeping to Get There Faster , .

[9]  Jehan-François Pâris,et al.  Voting with Witnesses: A Constistency Scheme for Replicated Files , 1986, ICDCS.

[10]  Darrell D. E. Long,et al.  Voting with regenerable volatile witnesses , 1991, [1991] Proceedings. Seventh International Conference on Data Engineering.

[11]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[12]  Gustavo Alonso,et al.  Don't Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication , 2000, VLDB.

[13]  Emmanuelle Anceaume,et al.  A lightweight solution to uniform atomic broadcast for asynchronous systems , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[14]  Michael Rabinovich,et al.  A performance study of general grid structures for replicated data , 1993, [1993] Proceedings. The 13th International Conference on Distributed Computing Systems.

[15]  Xavier Défago,et al.  Semi-passive replication , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[16]  Kenneth P. Birman,et al.  Replication and fault-tolerance in the ISIS system , 1985, SOSP '85.

[17]  Fred B. Schneider,et al.  Replication management using the state-machine approach , 1993 .

[18]  Robbert van Renesse,et al.  Reliable Distributed Computing with the Isis Toolkit , 1994 .

[19]  Moni Naor,et al.  The Load, Capacity, and Availability of Quorum Systems , 1998, SIAM J. Comput..

[20]  Hector Garcia-Molina,et al.  Increasing availability under mutual exclusion constraints with dynamic vote reassignment , 1989, TOCS.

[21]  Flaviu Cristian,et al.  Understanding fault-tolerant distributed systems , 1991, CACM.

[22]  Yair Amir,et al.  From total order to database replication , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[23]  P. Erdos-L Lovász Problems and Results on 3-chromatic Hypergraphs and Some Related Questions , 2022 .

[24]  Divyakant Agrawal,et al.  The performance of database replication with group multicast , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[25]  Gustavo Alonso,et al.  Improving the scalability of fault-tolerant database clusters , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[26]  Gil Neiger A new look at membership services (extended abstract) , 1996, PODC '96.

[27]  Dennis Shasha,et al.  The dangers of replication and a solution , 1996, SIGMOD '96.

[28]  Divyakant Agrawal,et al.  Database Replication Using Epidemic Communication , 2000, Euro-Par.

[29]  Luís E. T. Rodrigues,et al.  Appia, a flexible protocol kernel supporting multiple coordinated channels , 2001, Proceedings 21st International Conference on Distributed Computing Systems.

[30]  Danny Dolev,et al.  Evaluating Total Order Algorithms in WAN , 2003 .

[31]  Rui Oliveira,et al.  Object-oriented open implementation of reliable communication protocols , 1997, OOPSLA 1997.

[32]  Gustavo Alonso,et al.  Exploiting atomic broadcast in replicated databases , 1997 .

[33]  Oliver E. Theel,et al.  General design of grid-based data replication schemes using graphs and a few rules , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[34]  Louise E. Moser,et al.  Robust and Efficient Replication Using Group Communication , 1994 .

[35]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[36]  Andrew S. Tanenbaum,et al.  Group communication in the Amoeba distributed operating system , 1991, [1991] Proceedings. 11th International Conference on Distributed Computing Systems.

[37]  André Schiper,et al.  Optimistic atomic broadcast: a pragmatic viewpoint , 2003, Theor. Comput. Sci..

[38]  Louise E. Moser,et al.  Broadcast Protocols for Distributed Systems , 1990, IEEE Trans. Parallel Distributed Syst..

[39]  Mostafa H. Ammar,et al.  Multidimensional voting , 1991, TOCS.

[40]  Gustavo Alonso,et al.  A suite of database replication protocols based on group communication primitives , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[41]  Fernando Pedone The database state machine and group communication issues , 1999 .

[42]  Sam Toueg,et al.  Fault-tolerant broadcasts and related problems , 1993 .

[43]  André Schiper,et al.  Lightweight causal and atomic group multicast , 1991, TOCS.

[44]  Gerhard Weikum,et al.  Federated Transaction Management with Snapshot Isolation , 1999, FMLDO.

[45]  Derek L. Eager,et al.  Achieving robustness in distributed database systems , 1983, TODS.

[46]  Gustavo Alonso,et al.  Understanding replication in databases and distributed systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[47]  Jim Gray,et al.  A critique of ANSI SQL isolation levels , 1995, SIGMOD '95.

[48]  Gustavo Alonso,et al.  Scalable Replication in Database Clusters , 2000, DISC.

[49]  Stephen Fox,et al.  A recovery algorithm for a distributed database system , 1983, PODS.

[50]  Flaviu Cristian,et al.  Applying simulation to the design and performance evaluation of fault-tolerant systems , 1997, Proceedings of SRDS'97: 16th IEEE Symposium on Reliable Distributed Systems.

[51]  Rachid Guerraoui Revistiting the Relationship Between Non-Blocking Atomic Commitment and Consensus , 1995, WDAG.

[52]  Jehan-François Pâris Voting with bystanders , 1989, [1989] Proceedings. The 9th International Conference on Distributed Computing Systems.

[53]  Nancy A. Lynch,et al.  Specifications and Proofs for Ensemble Layers , 1999, TACAS.

[54]  Louise E. Moser,et al.  Totem: a fault-tolerant multicast group communication system , 1996, CACM.

[55]  Gustavo Alonso,et al.  Are quorums an alternative for data replication? , 2003, TODS.

[56]  André Schiper,et al.  From group communication to transactions in distributed systems , 1996, CACM.

[57]  Akhil Kumar,et al.  Hierarchical Quorum Consensus: A New Algorithm for Managing Replicated Data , 1991, IEEE Trans. Computers.

[58]  Robbert van Renesse,et al.  Voting with ghosts , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[59]  Louise E. Moser,et al.  Extended virtual synchrony , 1994, 14th International Conference on Distributed Computing Systems.

[60]  Robert H. Thomas,et al.  A Majority consensus approach to concurrency control for multiple copy databases , 1979, ACM Trans. Database Syst..

[61]  Vern Paxson,et al.  Measurements and analysis of end-to-end Internet dynamics , 1997 .

[62]  Hagit Attiya,et al.  Sequential consistency versus linearizability , 1994, TOCS.

[63]  Gustavo Alonso,et al.  Database replication techniques: a three parameter classification , 2000, Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000.

[64]  Newtop: a fault-tolerant group communication protocol , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[65]  Rachid Guerraoui,et al.  Software-Based Replication for Fault Tolerance , 1997, Computer.

[66]  Gustavo Alonso,et al.  A new approach to developing and implementing eager database replication protocols , 2000, TODS.

[67]  Divyakant Agrawal,et al.  Partial database replication using epidemic communication , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[68]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[69]  Fred B. Schneider,et al.  The primary-backup approach , 1993 .

[70]  Gustavo Alonso,et al.  Supporting partial data accesses to replicated data , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[71]  Yair Amir,et al.  Transis: a communication subsystem for high availability , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[72]  David Peleg,et al.  Crumbling walls: a class of practical and efficient quorum systems , 1995, PODC '95.

[73]  Catriel Beeri,et al.  A model for concurrency in nested transactions systems , 1989, JACM.

[74]  Richard D. Schlichting,et al.  Preserving and using context information in interprocess communication , 1989, TOCS.

[75]  Bruce Momjian,et al.  PostgreSQL: Introduction and Concepts , 2000 .

[76]  Chienwen Wu,et al.  The triangular lattice protocol: a highly fault tolerant and highly efficient protocol for replicated data , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.

[77]  Emmanuelle Anceaume,et al.  On the Formal Specification of Group Membership Services , 1994 .

[78]  Aleta Marie Ricciardi,et al.  The Group Membership Problem in Asynchronous Systems , 1993 .

[79]  Irving L. Traiger,et al.  Transactions and consistency in distributed database systems , 1982, TODS.

[80]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[81]  Louise E. Moser,et al.  The Totem single-ring ordering and membership protocol , 1995, TOCS.

[82]  Sushil Jajodia,et al.  Dynamic voting algorithms for maintaining the consistency of a replicated database , 1990, TODS.

[83]  David Powell,et al.  Fault-tolerance in Delta-4 , 1991, OPSR.

[84]  Gustavo Alonso,et al.  Processing transactions over optimistic atomic broadcast protocols , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[85]  James H. Cowie Scalable Simulation Framework API Reference Manual , 1999 .

[86]  Gottfried Vossen,et al.  Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery , 2002 .

[87]  J.H. Cowie,et al.  Modeling the global Internet , 1999, Comput. Sci. Eng..

[88]  Danny Dolev,et al.  The Transis approach to high availability cluster communication , 1996, CACM.

[89]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[90]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[91]  Divyakant Agrawal,et al.  Using broadcast primitives in replicated databases , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[92]  Hector Garcia-Molina,et al.  How to assign votes in a distributed system , 1985, JACM.

[93]  Kathryn S. McKinley,et al.  Partial collection replication versus caching for information retrieval systems , 2000, SIGIR '00.

[94]  André Schiper,et al.  Primary-backup replication: from a time-free protocol to a time-based implementation , 2001, Proceedings 20th IEEE Symposium on Reliable Distributed Systems.