Experience with modularity in consul

The use of modularity in the design and implementation of complex software simplifies the development process, as well as facilitating the construction of customized configurations. This paper describes our experience using modularity in Consul, a communication substrate used for constructing fault‐tolerant distributed programs. First, Consul is presented as a case study of how modularity is feasible in both the design and the implementation of such systems. Secondly, general lessons about modularity in fault‐tolerant systems based on our experience with Consul are given. Issues that are addressed include deciding how the system is divided into various modules, dealing with problems that result when protocols are combined, and ensuring that the underlying object infrastructure provides adequate support. The key observation is that the modularization process is most affected by dependencies between modules, both direct dependencies caused by one module explicitly using another's operation and indirect dependencies where one module is affected by another without direct interaction. Although our observations are based on designing and implementing Consul, the lessons are applicable to any fault‐tolerant distributed system.

[1]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[2]  Butler W. Lampson,et al.  Atomic Transactions , 1980, Advanced Course: Distributed Systems.

[3]  Jo-Mei Chang,et al.  Reliable broadcast protocols , 1984, TOCS.

[4]  Nicholas Carriero,et al.  Linda and Friends , 1986, Computer.

[5]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[6]  Paulo Veríssimo,et al.  The Delta-4 approach to dependability in open distributed computing systems , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[7]  Flaviu Cristian,et al.  Agreeing on who is present and who is absent in a synchronous distributed system , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[8]  Paulo Veríssimo,et al.  AMp: a highly parallel atomic multicast protocol , 1989, SIGCOMM '89.

[9]  Hermann Kopetz,et al.  Distributed fault-tolerant real-time systems: the Mars approach , 1989, IEEE Micro.

[10]  Henri E. Bal,et al.  An efficient reliable broadcast protocol , 1989, OPSR.

[11]  Shivakant Mishra,et al.  Implementing fault-tolerant replicated objects using Psync , 1989, Proceedings of the Eighth Symposium on Reliable Distributed Systems.

[12]  P.M. Melliar-Smith,et al.  Fault-tolerant distributed systems based on broadcast communication , 1989, [1989] Proceedings. The 9th International Conference on Distributed Computing Systems.

[13]  Richard D. Schlichting,et al.  Preserving and using context information in interprocess communication , 1989, TOCS.

[14]  Paulo Veríssimo,et al.  Reliable broadcast for fault-tolerance on local computer networks , 1990, Proceedings Ninth Symposium on Reliable Distributed Systems.

[15]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[16]  D. McCue,et al.  Fault-Tolerance in the Advanced Automation System , 1991, OPSR.

[17]  Hector Garcia-Molina,et al.  Ordered and reliable multicast communication , 1991, TOCS.

[18]  André Schiper,et al.  Lightweight causal and atomic group multicast , 1991, TOCS.

[19]  Larry L. Peterson,et al.  The x-Kernel: An Architecture for Implementing Network Protocols , 1991, IEEE Trans. Software Eng..

[20]  Flaviu Cristian,et al.  Understanding fault-tolerant distributed systems , 1991, CACM.

[21]  Hermann Kopetz,et al.  Fault-Tolerant Membership Service in a Synchronous Distributed Real-Time System , 1991 .

[22]  Shivakant Mishra,et al.  Abstractions for Constructing Dependable Distributed Systems , 1992 .

[23]  Shivakant Mishra,et al.  A Membership Protocol Based on Partial Order , 1992 .

[24]  Matti A. Hiltunen,et al.  Modularizing fault-tolerant protocols , 1992, EW 5.

[25]  Shivakant Mishra,et al.  Consul: a communication substrate for fault-tolerant distributed programs , 1993, Distributed Syst. Eng..

[26]  Flaviu Cristian,et al.  Atomic Broadcast: From Simple Message Diffusion to Byzantine Agreement , 1995, Inf. Comput..

[27]  Hermann Kopetz,et al.  THE ARCHITECTURE OF MARS , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..