Newtop: a fault-tolerant group communication protocol

A general purpose group communication protocol suite called Newtop is described. It is assumed that processes can simultaneously belong to many groups, group size could be large, and processes could be communicating over the Internet. Asynchronous communication environment is therefore assumed where message transmission times cannot be accurately estimated, and the underlying network may well get partitioned, preventing functioning processes from communicating with each other. Newtop can provide causality preserving total order delivery to members of a group, ensuring that total order delivery is preserved for multi-group processes. Both symmetric and asymmetric order protocols are supported, permitting a process to use say symmetric version in one group and asymmetric version in other.

[1]  Raimundo José de Araújo Macêdo Fault-tolerant group communication protocols for asynchronous systems , 1994 .

[2]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[3]  Shivakant Mishra,et al.  A Membership Protocol Based on Partial Order , 1992 .

[4]  Yair Amir,et al.  Membership Algorithms for Multicast Communication Groups , 1992, WDAG.

[5]  Kenneth P. Birman,et al.  Using process groups to implement failure detection in asynchronous environments , 1991, PODC '91.

[6]  Richard D. Schlichting,et al.  Preserving and using context information in interprocess communication , 1989, TOCS.

[7]  Hector Garcia-Molina,et al.  Ordered and reliable multicast communication , 1991, TOCS.

[8]  André Schiper,et al.  Lightweight causal and atomic group multicast , 1991, TOCS.

[9]  A. Mostefaoui,et al.  Causal multicasts in overlapping groups: towards a low cost approach , 1993, 1993 4th Workshop on Future Trends of Distributed Computing Systems.

[10]  Sam Toueg,et al.  Unreliable Failure Detectors for Asynchronous Systems , 1991 .

[11]  Shivakant Mishra,et al.  Consul: a communication substrate for fault-tolerant distributed programs , 1993, Distributed Syst. Eng..

[12]  Yair Amir,et al.  Transis: A Communication Sub-system for High Availability , 1992 .

[13]  Danny Dolev,et al.  Early delivery totally ordered multicast in asynchronous environments , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[14]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[15]  Özalp Babaoglu,et al.  RELACS: A communications infrastructure for constructing reliable applications in large-scale distributed systems , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[16]  Jo-Mei Chang,et al.  Reliable broadcast protocols , 1984, TOCS.

[17]  Louise E. Moser,et al.  Broadcast Protocols for Distributed Systems , 1990, IEEE Trans. Parallel Distributed Syst..

[18]  Louise E. Moser,et al.  Membership algorithms for asynchronous distributed systems , 1991, [1991] Proceedings. 11th International Conference on Distributed Computing Systems.

[19]  André Schiper,et al.  Virtually-synchronous communication based on a weak failure suspector , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.