Strong and weak virtual synchrony in Horus

This paper presents two variants of virtual synchrony, which are supported by Horus. The first variant, called strong virtual synchrony, includes the property that every message is delivered within the view in which it is sent. This property is very useful in developing applications, since it helps in minimizing the amount of context information that needs to be sent on messages, and the amount of computation which is required in order to process a message. However, it is shown that in order to support this property, the application program has to block messages during view changes. An alternative definition, called weak virtual synchrony, which can be implemented without blocking messages, is then presented. This definition still guarantees that messages will be delivered within the view in which they were sent, only that it uses a slightly weaker notion of what the view in which a message was sent is. An implementation of weak virtual synchrony that does not block messages during view changes as also developed in this paper.

[1]  Edsger W. Dijkstra,et al.  Self-stabilizing systems in spite of distributed control , 1974, CACM.

[2]  Rachid Guerraoui,et al.  Transaction Model vs. Virtual Synchrony Model: Bridging the Gap , 1994, Dagstuhl Seminar on Distributed Systems.

[3]  Gil Neiger,et al.  Automatically Increasing the Fault-Tolerance of Distributed Algorithms , 1990, J. Algorithms.

[4]  Gianluca Dini,et al.  Replicated File Management in Large-Scale Distributed Systems , 1994, WDAG.

[5]  Jo-Mei Chang,et al.  Reliable broadcast protocols , 1984, TOCS.

[6]  Kenneth P. Birman,et al.  Integrating Runtime Consistency Models for Distributed Computing , 1994, J. Parallel Distributed Comput..

[7]  Sam Toueg,et al.  Reliable Broadcast in Synchronous and Asynchronous Environments (Preliminary Version) , 1989, WDAG.

[8]  Kenneth P. Birman,et al.  Exploiting virtual synchrony in distributed systems , 1987, SOSP '87.

[9]  Silvano Maffeis,et al.  A generic multicast transport service to support disconnected operation , 1995, Wirel. Networks.

[10]  Kenneth P. Birman,et al.  Designing application software in wide area network settings , 1990, EW 4.

[11]  André Schiper,et al.  Uniform reliable multicast in a virtually synchronous environment , 1993, [1993] Proceedings. The 13th International Conference on Distributed Computing Systems.

[12]  Louise E. Moser,et al.  Extended virtual synchrony , 1994, 14th International Conference on Distributed Computing Systems.

[13]  Robbert van Renesse,et al.  Design and Performance of Horus: A Lightweight Group Communications System , 1994 .

[14]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[15]  Roy Friedman,et al.  Using Group Communication Technology to Implement a Reliable andScalable Distributed IN Coprocessor , 1996 .

[16]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[17]  Özalp Babaoglu,et al.  RELACS: A communications infrastructure for constructing reliable applications in large-scale distributed systems , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[18]  Newtop: a fault-tolerant group communication protocol , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[19]  Louise E. Moser,et al.  Fast message ordering and membership using a logical token-passing ring , 1993, [1993] Proceedings. The 13th International Conference on Distributed Computing Systems.

[20]  Andre Schiper,et al.  View Synchronous Communication in Large Scale Networks , 1995 .

[21]  Kenneth P. Birman,et al.  The process group approach to reliable distributed computing , 1992, CACM.

[22]  Yair Amir,et al.  Transis: a communication subsystem for high availability , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[23]  Shivakant Mishra,et al.  Consul: a communication substrate for fault-tolerant distributed programs , 1993, Distributed Syst. Eng..

[24]  Richard D. Schlichting,et al.  Preserving and using context information in interprocess communication , 1989, TOCS.

[25]  Yair Amir,et al.  Transis: A Communication Sub-system for High Availability , 1992 .

[26]  André Schiper,et al.  Lightweight causal and atomic group multicast , 1991, TOCS.

[27]  Roy Friedman,et al.  A framework for protocol composition in Horus , 1995, PODC '95.

[28]  Michael Williams,et al.  Replication in the harp file system , 1991, SOSP '91.