Providing high availability using lazy replication

To provide high availability for services such as mail or bulletin boards, data must be replicated. One way to guarantee consistency of replicated data is to force service operations to occur in the same order at all sites, but this approach is expensive. For some applications a weaker causal operation order can preserve consistency while providing better performance. This paper describes a new way of implementing causal operations. Our technique also supports two other kinds of operations: operations that are totally ordered with respect to one another and operations that are totally ordered with respect to all other operations. The method performs well in terms of response time, operation-processing capacity, amount of stored state, and number and size of messages; it does better than replication methods based on reliable multicast techniques.

[1]  Dean G. Blevins,et al.  Introduction 1-2 , 1969 .

[2]  J. D. Day,et al.  A principle for resilient sharing of distributed resources , 1976, ICSE '76.

[3]  Butler W. Lampson,et al.  Crash Recovery in a Distributed Data Storage System , 1981 .

[4]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[5]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[6]  Dale Skeen,et al.  Nonblocking commit protocols , 1981, SIGMOD '81.

[7]  David Kenneth Gifford,et al.  Information storage in a decentralized computer system , 1981 .

[8]  Roger M. Needham,et al.  Grapevine: an exercise in distributed computing , 1982, CACM.

[9]  Michael J. Fischer,et al.  Sacrificing serializability to attain high availability of data in an unreliable network , 1982, PODS.

[10]  Alley Stoughton,et al.  Detection of Mutual Inconsistency in Distributed Systems , 1983, IEEE Transactions on Software Engineering.

[11]  Alfred Z. Spector,et al.  Synchronizing shared abstract types , 1984, TOCS.

[12]  Philip A. Bernstein,et al.  An algorithm for concurrency control and recovery in replicated distributed databases , 1984, TODS.

[13]  Arthur J. Bernstein,et al.  Efficient solutions to the replicated log and dictionary problems , 1984, PODC '84.

[14]  Dale Skeen,et al.  Increasing availability in partitioned database systems , 1984, Adv. Comput. Res..

[15]  Amr El Abbadi,et al.  Availability in partitioned replicated databases , 1985, PODS.

[16]  William E. Weihl,et al.  Implementation of resilient, atomic data types , 1985, TOPL.

[17]  Flaviu Cristian,et al.  An efficient, fault-tolerant protocol for replicated data management , 1985, Fault-Tolerant Distributed Computing.

[18]  Butler W. Lampson,et al.  Designing a global name service , 1986, PODC '86.

[19]  Barbara Liskov,et al.  Highly available distributed services and fault-tolerant distributed garbage collection , 1986, PODC '86.

[20]  M. Herlihy A quorum-consensus replication method for abstract data types , 1986, TOCS.

[21]  Flaviu Cristian,et al.  An efficient, fault-tolerant protocol for replicated data management , 1985, PODS '85.

[22]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[23]  Kenneth P. Birman,et al.  Exploiting virtual synchrony in distributed systems , 1987, SOSP '87.

[24]  William E. Weihl,et al.  Distributed Version Management for Read-Only Actions , 1985, IEEE Transactions on Software Engineering.

[25]  B. M. Oki,et al.  VIEWSTAMPED REPLICATION FOR HIGHLY AVAILABLE DISTRIBUTED SYSTEMS , 1988 .

[26]  T. Bloom,et al.  Communications in the Mercury system , 1988, [1988] Proceedings of the Twenty-First Annual Hawaii International Conference on System Sciences. Volume II: Software track.

[27]  Barbara Liskov,et al.  Distributed programming in Argus , 1988, CACM.

[28]  D. J. Hwang,et al.  CONSTRUCTING A HIGHLY-AVAILABLE LOCATION SERVICE FOR A DISTRIBUTED ENVIRONMENT , 1988 .

[29]  Frank Bernhard Schmuck,et al.  The Use of Efficient Broadcast Protocols in Asynchronous Distributed Systems , 1988 .

[30]  David L. Mills,et al.  Network Time Protocol (version 1) specification and implementation , 1988, RFC.

[31]  Ladin Rivka A method for constructing highly available services and a technique for distributed garbage collection , 1989 .

[32]  Meichun Hsu,et al.  Two Pase Gossip: Managing Distributed Event Histories , 1989, Inf. Sci..

[33]  Shivakant Mishra,et al.  Implementing fault-tolerant replicated objects using Psync , 1989, Proceedings of the Eighth Symposium on Reliable Distributed Systems.

[34]  Amr El Abbadi,et al.  Maintaining availability in partitioned replicated databases , 1987, ACM Trans. Database Syst..

[35]  Michael Williams,et al.  Replication in the harp file system , 1991, SOSP '91.

[36]  André Schiper,et al.  Lightweight causal and atomic group multicast , 1991, TOCS.

[37]  M.S. Mazer,et al.  A general tool for replicating distributed services , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[38]  David L. Mills,et al.  Network Time Protocol (Version 3) Specification, Implementation , 1992 .