A Fault-Tolerant Algorithm for Replicated Data Management

We examine the tradeoff between message overhead and data availability that arises in the design of fault-tolerant algorithms for replicated data management in distributed systems. We propose a property called asymptotically high resiliency which is useful for evaluating the fault-tolerance of replica control algorithms and distributed mutual exclusion algorithms. We present a new algorithm for replica control that can be tailored (through a design parameter) to achieve the desired balance between low message overhead and high data availability. Further, we show that for a message overhead of O(/spl radic/(Nlog N)), our algorithm can achieve asymptotically high resiliency.

[1]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[2]  M. Herlihy A quorum-consensus replication method for abstract data types , 1986, TOCS.

[3]  Mostafa H. Ammar,et al.  Performance Characterization of Quorum-Consensus Algorithms for Replicated Data , 1989, IEEE Trans. Software Eng..

[4]  Hector Garcia-Molina,et al.  How to assign votes in a distributed system , 1985, JACM.

[5]  Robert H. Thomas,et al.  A Majority consensus approach to concurrency control for multiple copy databases , 1979, ACM Trans. Database Syst..

[6]  Dale Skeen,et al.  A Quorum-Based Commit Protocol , 1982, Berkeley Workshop.

[7]  Jehan-François Pâris,et al.  Voting with Witnesses: A Constistency Scheme for Replicated Files , 1986, ICDCS.

[8]  Divyakant Agrawal,et al.  Exploiting Logical Structures in Replicated Databases , 1990, Inf. Process. Lett..

[9]  Sushil Jajodia,et al.  Dynamic voting , 1987, SIGMOD '87.

[10]  J. Spencer Probabilistic Methods in Combinatorics , 1974 .

[11]  Akhil Kumar,et al.  Performance analysis of a hierarchical quorum consensus algorithm for replicated objects , 1990, Proceedings.,10th International Conference on Distributed Computing Systems.

[12]  Mostafa H. Ammar,et al.  Multidimensional voting , 1991, TOCS.

[13]  Ashok K. Agrawala,et al.  An optimal algorithm for mutual exclusion in computer networks , 1981, CACM.

[14]  Mostafa H. Ammar,et al.  The grid protocol: a high performance scheme for maintaining replicated data , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[15]  Darrell D. E. Long,et al.  Efficient dynamic voting algorithms , 1988, Proceedings. Fourth International Conference on Data Engineering.

[16]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[17]  J. D. Day,et al.  A principle for resilient sharing of distributed resources , 1976, ICSE '76.

[18]  Susan B. Davidson,et al.  Optimism and consistency in partitioned distributed database systems , 1984, TODS.

[19]  Divyakant Agrawal,et al.  Efficient solution to the distributed mutual exclusion problem , 1989, PODC '89.

[20]  Flaviu Cristian,et al.  An efficient, fault-tolerant protocol for replicated data management , 1985, Fault-Tolerant Distributed Computing.

[21]  Hector Garcia-Molina,et al.  Consistency in a partitioned network: a survey , 1985, CSUR.

[22]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[23]  Akhil Kumar,et al.  Hierarchical Quorum Consensus: A New Algorithm for Managing Replicated Data , 1991, IEEE Trans. Computers.

[24]  Walter A. Burkhard,et al.  Consistency and recovery control for replicated files , 1985, SOSP 1985.

[25]  Mamoru Maekawa,et al.  A N algorithm for mutual exclusion in decentralized systems , 1985, TOCS.

[26]  Richard D. Schlichting,et al.  Fail-stop processors: an approach to designing fault-tolerant computing systems , 1983, TOCS.

[27]  Derek L. Eager,et al.  Achieving robustness in distributed database systems , 1983, TODS.