Dynamic voting algorithms for maintaining the consistency of a replicated database

There are several replica control algorithms for managing replicated files in the face of network partitioning due to site or communication link failures. Pessimistic algorithms ensure consistency at the price of reduced availability; they permit at most one (distinguished) partition to process updates at any given time. The best known pessimistic algorithm, voting, is a “static” algorithm, meaning that all potential distinguished partitions can be listed in advance. We present a dynamic extension of voting called dynamic voting. This algorithm permits updates in a partition provided it contains more than half of the up-to-date copies of the replicated file. We also present an extension of dynamic voting called dynamic voting with linearly ordered copies (abbreviated as dynamic-linear). These algorithms are dynamic because the order in which past distinguished partitions were created plays a role in the selection of the next distinguished partition. Our algorithms have all the virtues of ordinary voting, including its simplicity, and provide improved availability as well. We provide two stochastic models to support the latter claim. In the first (site) model, sites may fail but communication links are infallible; in the second (link) model the reverse is true. We prove that under the site model, dynamic-linear has greater availability than any static algorithm, including weighted voting, if there are four or more sites in the network. In the link model, we consider all biconnected five-site networks and a wide variety of failure and repair rates. In all cases considered, dynamic-linear had greater availability than any static algorithm.

[1]  Hector Garcia-Molina,et al.  Elections in a Distributed Computing System , 1982, IEEE Transactions on Computers.

[2]  Hector Garcia-Molina,et al.  Reliability issues for fully replicated distributed databases , 1982, Computer.

[3]  David D. Wright,et al.  On merging partitioned databases , 1983, SIGMOD '83.

[4]  Hector Garcia-Molina,et al.  The Reliability of Voting Mechanisms , 1987, IEEE Transactions on Computers.

[5]  J. D. Day,et al.  A principle for resilient sharing of distributed resources , 1976, ICSE '76.

[6]  Sushil Jajodia,et al.  Enhancements to the Voting Algorithm , 1987, VLDB.

[7]  Leslie Lamport,et al.  Reaching Agreement in the Presence of Faults , 1980, JACM.

[8]  Stefano Ceri,et al.  Distributed Databases: Principles and Systems , 1984 .

[9]  Philip A. Bernstein,et al.  Site Initialization, Recovery, and Backup in a Distributed Database System , 1984, IEEE Transactions on Software Engineering.

[10]  Gianfranco Ciardo,et al.  Stochastic Petri Net Analysis of a Replicated File System , 1989, IEEE Trans. Software Eng..

[11]  Mostafa H. Ammar,et al.  Performance Characterization of Quorum-Consensus Algorithms for Replicated Data , 1989, IEEE Trans. Software Eng..

[12]  Leslie Lamport,et al.  The Implementation of Reliable Distributed Multiprocess Systems , 1978, Comput. Networks.

[13]  H ThomasRobert A Majority consensus approach to concurrency control for multiple copy databases , 1979 .

[14]  Sushil Jajodia,et al.  Dynamic voting , 1987, SIGMOD '87.

[15]  Derek L. Eager,et al.  Achieving robustness in distributed database systems , 1983, TODS.

[16]  Barbara T. Blaustein,et al.  Updating Replicated Data During Communications Failures , 1985, VLDB.

[17]  Toshimi Minoura,et al.  Resilient Extended True-Copy Token Scheme for a Distributed Database System , 1982, IEEE Transactions on Software Engineering.

[18]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[19]  David Dixon Wright,et al.  Managing Distributed Databases in Partitioned Networks , 1984 .

[20]  Flaviu Cristian,et al.  An efficient, fault-tolerant protocol for replicated data management , 1985, PODS '85.

[21]  Robert H. Thomas,et al.  A Majority consensus approach to concurrency control for multiple copy databases , 1979, ACM Trans. Database Syst..

[22]  Bruce W. Char,et al.  A Tutorial Introduction to Maple , 1986, J. Symb. Comput..

[23]  Michael Stonebraker,et al.  A Formal Model of Crash Recovery in a Distributed System , 1983, IEEE Transactions on Software Engineering.

[24]  K. V. S. Ramarao,et al.  Detection of mutual inconsistency in Distributed Databases , 1987, 1987 IEEE Third International Conference on Data Engineering.

[25]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[26]  Amr El Abbadi,et al.  Availability in partitioned replicated databases , 1985, PODS.

[27]  Hector Garcia-Molina,et al.  Policies for Dynamic Vote Reassignment , 1986, ICDCS.

[28]  Walter H. Kohler,et al.  A Survey of Techniques for Synchronization and Recovery in Decentralized Computer Systems , 1981, CSUR.

[29]  Sushil Jajodia,et al.  Mutual consistency in decentralized distributed systems , 1987, 1987 IEEE Third International Conference on Data Engineering.

[30]  Alley Stoughton,et al.  Detection of Mutual Inconsistency in Distributed Systems , 1983, IEEE Transactions on Software Engineering.

[31]  K. V. S. Ramarao,et al.  Transaction atomicity in the presence of network partitions , 1988, Proceedings. Fourth International Conference on Data Engineering.

[32]  Flaviu Cristian,et al.  An efficient, fault-tolerant protocol for replicated data management , 1985, Fault-Tolerant Distributed Computing.

[33]  Hector Garcia-Molina,et al.  Protocols for dynamic vote reassignment , 1986, PODC '86.

[34]  Hector Garcia-Molina,et al.  How to assign votes in a distributed system , 1985, JACM.

[35]  Hector Garcia-Molina,et al.  Consistency in a partitioned network: a survey , 1985, CSUR.

[36]  Eric C. Cooper Analysis of distributed commit protocols , 1982, SIGMOD '82.

[37]  Dale Skeen,et al.  Increasing availability in partitioned database systems , 1984, Adv. Comput. Res..

[38]  Hector Garcia-Molina,et al.  Increasing availability under mutual exclusion constraints with dynamic vote reassignment , 1989, TOCS.

[39]  Jim Gray,et al.  Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[40]  Hector Garcia-Molina,et al.  The vulnerability of vote assignments , 1986, TOCS.

[41]  Walter A. Burkhard,et al.  Consistency and recovery control for replicated files , 1985, SOSP '85.

[42]  Richard D. Schlichting,et al.  Fail-stop processors: an approach to designing fault-tolerant computing systems , 1981, TOCS.

[43]  Butler W. Lampson,et al.  Crash Recovery in a Distributed Data Storage System , 1981 .

[44]  Barbara T. Blaustein,et al.  System architecture for partition-tolerant distributed databases , 1985, IEEE Transactions on Computers.

[45]  Walter A. Burkhard,et al.  Consistency and recovery control for replicated files , 1985, SOSP 1985.

[46]  Philip A. Bernstein,et al.  Concurrency Control in Distributed Database Systems , 1986, CSUR.

[47]  Michael J. Fischer,et al.  Sacrificing serializability to attain high availability of data in an unreliable network , 1982, PODS.

[48]  Amr El Abbadi,et al.  Maintaining availability in partitioned replicated databases , 1987, ACM Trans. Database Syst..

[49]  Susan B. Davidson,et al.  Optimism and consistency in partitioned distributed database systems , 1984, TODS.

[50]  Sushil Jajodia,et al.  Integrating static and dynamic voting protocols to enhance file availability , 1988, Proceedings. Fourth International Conference on Data Engineering.

[51]  Sushil Jajodia,et al.  A Pessimistic Consistency Control Algorithm for Replicated Files which Achieves High Availability , 1989, IEEE Trans. Software Eng..

[52]  Jehan-François Pâris,et al.  Voting with Witnesses: A Constistency Scheme for Replicated Files , 1986, ICDCS.