Weak-consistency group communication and membership

Many distributed systems for wide-area networks can be built conveniently, and operate efficiently and correctly, using a weak consistency group communication mechanism. This mechanism organizes a set of principals into a single logical entity, and provides methods to multicast messages to the members. A weak consistency distributed system allows the principals in the group to differ on the value of shared state at any given instant, as long as they will eventually converge to a single, consistent value. A group containing many principals and using weak consistency can provide the reliability, performance, and scalability necessary for wide-area systems. I have developed a framework for constructing group communication systems, for classifying existing distributed system tools, and for constructing and reasoning about a particular group communication model. It has four components: message delivery, message ordering, group membership, and the application. Each component may have a different implementation, so that the group mechanism can be tailored to application requirements. The framework supports a new message delivery protocol, called timestamped anti-entropy, which provides reliable, eventual message delivery; is efficient; and tolerates most transient processor and network failures. It can be combined with message ordering implementations that provide ordering guarantees ranging from unordered to total, causal delivery. A new group membership protocol completes the set, providing temporarily inconsistent membership views resilient to up to k simultaneous principal failures. The Refdbms distributed bibliographic database system, which has been constructed using this framework, is used as an example. Refdbms databases can be replicated on many different sites, using the group communication system described here.

[1]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[2]  Robert H. Thomas,et al.  A Majority consensus approach to concurrency control for multiple copy databases , 1979, ACM Trans. Database Syst..

[3]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[4]  Paul V. Mockapetris,et al.  Domain names: Concepts and facilities , 1983, RFC.

[5]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[6]  Yogen K. Dalal,et al.  The clearinghouse: a decentralized agent for locating named objects in a distributed environment , 1983, TOIS.

[7]  Willy Zwaenepoel,et al.  One-to-many interprocess communication in the V-system , 1984, Computer Communication Review.

[8]  Philip A. Bernstein,et al.  An algorithm for concurrency control and recovery in replicated distributed databases , 1984, TODS.

[9]  Douglas Brian Terry,et al.  Distributed name servers: naming and caching in large distributed computing environments , 1985 .

[10]  Amr El Abbadi,et al.  Availability in partitioned replicated databases , 1985, PODS.

[11]  Robert Joseph Fowler,et al.  Decentralized object finding using forwarding address , 1985 .

[12]  Walter A. Burkhard,et al.  Consistency and recovery control for replicated files , 1985, SOSP '85.

[13]  Robbert van Renesse,et al.  Using Sparse Capabilities in a Distributed Operating System , 1986, ICDCS.

[14]  John S. Quarterman,et al.  Notable computer networks , 1986, CACM.

[15]  Jim Gray,et al.  Why Do Computers Stop and What Can Be Done About It? , 1986, Symposium on Reliability in Distributed Software and Database Systems.

[16]  Andrew S. Tanenbaum,et al.  The Design of a Capability-Based Distributed Operating System , 1986, Comput. J..

[17]  Hector Garcia-Molina,et al.  Policies for Dynamic Vote Reassignment , 1986, ICDCS.

[18]  Alfred Z. Spector,et al.  A weighted voting algorithm for replicated directories , 1987, JACM.

[19]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[20]  Scott Shenker,et al.  Epidemic algorithms for replicated database maintenance , 1988, OPSR.

[21]  Noga Alon,et al.  On Disseminating Information Reliably without Broadcasting , 1987, ICDCS.

[22]  Sushil Jajodia,et al.  Dynamic voting , 1987, SIGMOD '87.

[23]  Michael B. Jones,et al.  A simple and efficient implementation of a small database , 1987, SOSP '87.

[24]  Meilir Page-Jones,et al.  The practical guide to structured systems design: 2nd edition , 1988 .

[25]  B. Clifford Neuman,et al.  Kerberos: An Authentication Service for Open Network Systems , 1988, USENIX Winter.

[26]  Armando P. Stettner The design and implementation of the 4.3BSD UNIX operating system , 1988 .

[27]  Douglas E. Comer,et al.  Internetworking with TCP/IP - Principles, Protocols, and Architectures, Fourth Edition , 1988 .

[28]  David L. Mills,et al.  Network Time Protocol (version 1) specification and implementation , 1988, RFC.

[29]  Hector Garcia-Molina,et al.  An implementation of reliable broadcast using an unreliable multicast facility , 1988, Proceedings [1988] Seventh Symposium on Reliable Distributed Systems.

[30]  Darrell D. E. Long,et al.  A realistic evaluation of optimistic dynamic voting , 1988, Proceedings [1988] Seventh Symposium on Reliable Distributed Systems.

[31]  Henri E. Bal The shared data-object model as a paradigm for programming distributed systems , 1989 .

[32]  Timothy P. Mann,et al.  An Algorithm for Data Replication , 1989 .

[33]  Flaviu Cristian A probabilistic approach to distributed clock synchronization , 1989, [1989] Proceedings. The 9th International Conference on Distributed Computing Systems.

[34]  Martín Abadi,et al.  A logic of authentication , 1989, Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences.

[35]  Shivakant Mishra,et al.  Implementing fault-tolerant replicated objects using Psync , 1989, Proceedings of the Eighth Symposium on Reliable Distributed Systems.

[36]  B. Liskov,et al.  Lazy replication: exploiting the semantics of distributed services , 1990, [1990] Proceedings. Workshop on the Management of Replicated Data.

[37]  Jon M. Peha,et al.  OSCAR: a system for weak-consistency replication , 1990, [1990] Proceedings. Workshop on the Management of Replicated Data.

[38]  Michael Stonebraker,et al.  Transaction Support in Read Optimizied and Write Optimized File Systems , 1990, VLDB.

[39]  Liuba Shrira,et al.  Lazy replication: exploiting the semantics of distributed services (extended abstract) , 1990, OPSR.

[40]  Henri E. Bal,et al.  Orca: a language for distributed programming , 1990, SIGP.

[41]  Rafael Alonso,et al.  Data caching issues in an information retrieval system , 1990, TODS.

[42]  Daniel Barbará,et al.  Using stashing to increase node autonomy in distributed file systems , 1990, Proceedings Ninth Symposium on Reliable Distributed Systems.

[43]  Hector Garcia-Molina,et al.  The case for controlled inconsistency in replicated data , 1990, [1990] Proceedings. Workshop on the Management of Replicated Data.

[44]  Eric Jul Object mobility in a distributed object-oriented system , 1990 .

[45]  Kenneth P. Birman,et al.  Fast causal multicast , 1990, EW 4.

[46]  Larry L. Peterson,et al.  Univers: the construction of an internet-wide descriptive naming system , 1990 .

[47]  Samuel J. Leffler,et al.  The design and implementation of the 4.3 BSD Unix operating system , 1991, Addison-Wesley series in computer science.

[48]  Ray Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[49]  Jon M. Peha,et al.  OSCAR: an architecture for weak-consistency replication , 1990, Proceedings. PARBASE-90: International Conference on Databases, Parallel Architectures, and Their Applications.

[50]  Garret Swart,et al.  Granularity and semantic level of replication in the Echo distributed file system , 1990, [1990] Proceedings. Workshop on the Management of Replicated Data.

[51]  Santosh K. Shrivastava,et al.  Replicated K-resilient objects in Arjuna , 1990, [1990] Proceedings. Workshop on the Management of Replicated Data.

[52]  Kenneth P. Birman,et al.  Programming with process groups: Group and multicast semantics , 1991 .

[53]  Liuba Shrira,et al.  Lazy replication: exploiting the semantics of distributed services , 1991, EW 4.

[54]  Kenneth P. Birman,et al.  Using process groups to implement failure detection in asynchronous environments , 1991, PODC '91.

[55]  Divyakant Agrawal,et al.  Efficient Dissemination of Information in Computer Networks , 1991, Comput. J..

[56]  Bruce Raymond Schatz Interactive retrieval in information spaces distributed across a wide-area network , 1991 .

[57]  William E. Lorensen,et al.  Object-Oriented Modeling and Design , 1991, TOOLS.

[58]  Brewster Kahle,et al.  An information system for corporate users: wide area information servers , 1991 .

[59]  B. Lampson,et al.  Authentication in distributed systems: theory and practice , 1991, TOCS.

[60]  Calton Pu,et al.  Replica control in distributed systems: as asynchronous approach , 1991, SIGMOD '91.

[61]  Darrell D. E. Long,et al.  A study of the reliability of Internet sites , 1991, [1991] Proceedings Tenth Symposium on Reliable Distributed Systems.

[62]  Darrell D. E. Long,et al.  Accessing Replicated Data in a Large-Scale Distributed System , 1991, Int. J. Comput. Simul..

[63]  Roy H. Campbell,et al.  Choices, frameworks and refinement , 1991, Proceedings 1991 International Workshop on Object Orientation in Operating Systems.

[64]  David L. Mills,et al.  Network Time Protocol (Version 3) Specification, Implementation , 1992 .

[65]  Darrell D. E. Long A replicated monitoring tool , 1992, [1992 Proceedings] Second Workshop on the Management of Replicated Data.

[66]  Richard A. Golding A Weak-Consistency Architecture for Distributed Information Services , 1992, Comput. Syst..

[67]  Peter Deutsch Resource Discovery in an Internet Environment—the Archie Approach , 1992 .

[68]  Shivakant Mishra,et al.  Protocol modularity in systems for managing replicated data , 1992, [1992 Proceedings] Second Workshop on the Management of Replicated Data.

[69]  Katta G. Murty,et al.  Network programming , 1992 .

[70]  Darrell D. E. Long,et al.  Quorum-oriented multicast protocols for data replication , 1992, [1992] Eighth International Conference on Data Engineering.

[71]  Mark Sullivan,et al.  An index implementation supporting fast recovery for the POSTGRES storage system , 1992, [1992] Eighth International Conference on Data Engineering.

[72]  Dennis Shasha,et al.  The many faces of consensus in distributed systems , 1992, Computer.

[73]  John S. Heidemann,et al.  Primarily disconnected operation: experiences with Ficus , 1992, [1992 Proceedings] Second Workshop on the Management of Replicated Data.

[74]  Mark K. Lottor Internet Growth (1981-1991) , 1992, RFC.

[75]  Chaoying Ma Designing a universal name service , 1992 .

[76]  Nitin K. Ganatra CENSUS: COLLECTING HOST INFORMATION ON A WIDE-AREA NETWORK (B.A. Thesis) , 1992 .

[77]  Mahadev Satyanarayanan,et al.  Disconnected Operation in the Coda File System , 1999, Mobidata.