Fault-tolerant causal delivery in group communication

In distributed systems, a group of processes are cooperated to execute an application program. A group is established among multiple processes and only processes in the group communicate with each other. This type of group communication is named intra-group communication. The communication system has to support the reliable intra-group communication in the presence of the process fault. In order to tolerate the process fault, each process in the group is replicated into a collection of multiple replicas named a cluster. In this paper, we would like to propose a new intra-group communication protocol which supports the causally ordered delivery of messages for the processes within the group. In addition, the protocol supports the reliable delivery of messages in the presence of the Byzantine faults of the processes.

[1]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[2]  H. Higaki,et al.  Fault-Tolerant Object by Group-to-Group Communications in Distributed Systems , 1993 .

[3]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[4]  Louise E. Moser,et al.  Extended virtual synchrony , 1994, 14th International Conference on Distributed Computing Systems.

[5]  Akihito Nakamura,et al.  Reliable broadcast protocol for selectively partially ordering PDUs (SPO protocol) , 1991, [1991] Proceedings. 11th International Conference on Distributed Computing Systems.

[6]  Makoto Takizawa,et al.  Selective total-ordering group communication on single high-speed channel , 1994, Proceedings of ICNP - 1994 International Conference on Network Protocols.

[7]  Newtop: a fault-tolerant group communication protocol , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[8]  Fred B. Schneider,et al.  The primary-backup approach , 1993 .

[9]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[10]  Hector Garcia-Molina,et al.  Ordered and reliable multicast communication , 1991, TOCS.

[11]  Clarence A. Ellis,et al.  Groupware: some issues and experiences , 1991, CACM.

[12]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[13]  Akihito Nakamura,et al.  Priority-based total and semi-total ordering broadcast protocols , 1992, [1992] Proceedings of the 12th International Conference on Distributed Computing Systems.

[14]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[15]  Fred B. Schneider,et al.  Replication management using the state-machine approach , 1993 .

[16]  André Schiper,et al.  Lightweight causal and atomic group multicast , 1991, TOCS.

[17]  David Powell,et al.  Fault-tolerance in Delta-4 , 1991, OPSR.

[18]  Akihito Nakamura,et al.  Causally ordering broadcast protocol , 1994, 14th International Conference on Distributed Computing Systems.

[19]  Fred B. Schneider,et al.  Byzantine generals in action: implementing fail-stop processors , 1984, TOCS.

[20]  Eric C. Cooper Replicated distributed programs , 1985, SOSP 1985.