A low latency, loss tolerant architecture and protocol for wide area group communication

Group communication systems are proven tools upon which to build fault-tolerant systems. As the demands for fault-tolerance increase and more applications require reliable distributed computing over wide area networks, wide area group communication systems are becoming very useful. However, building a wide area group communication system is a challenge. This paper presents the design of the transport protocols of the spread wide area group communication system. We focus on two aspects of the system. First, the value of using overlay networks for application level group communication services. Second, the requirements and design of effective low latency link protocols used to construct wide area group communication. We support our claims with the results of live experiments conducted over the Internet.

[1]  Ernst W. Biersack,et al.  Performance modelling of reliable multicast transmission , 1997, Proceedings of INFOCOM '97.

[2]  Paulo Veríssimo,et al.  Totally ordered multicast in large-scale systems , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[3]  Louise E. Moser,et al.  Extended virtual synchrony , 1994, 14th International Conference on Distributed Computing Systems.

[4]  Idit Keidar,et al.  Scalable group membership services for novel applications , 1997, Networks in Distributed Computing.

[5]  J. J. Garcia-Luna-Aceves,et al.  New error recovery structures for reliable multicasting , 1997, Proceedings of Sixth International Conference on Computer Communications and Networks.

[6]  Nancy A. Lynch,et al.  Specifying and using a partitionable group communication service , 1997, PODC '97.

[7]  Markus Hofmann,et al.  A Generic Concept for Large-Scale Multicast , 1996, International Zurich Seminar on Digital Communications.

[8]  Robbert van Renesse,et al.  Reliable Distributed Computing with the Isis Toolkit , 1994 .

[9]  Henrique Fonseca,et al.  A dynamic hybrid protocol for total order in large-scale systems , 2000 .

[10]  Kenneth P. Birman,et al.  Exploiting Virtual Synchrony in Distributed Systems. Revision. , 1987 .

[11]  Louise E. Moser,et al.  The Totem single-ring ordering and membership protocol , 1995, TOCS.

[12]  Kenneth P. Birman,et al.  Exploiting virtual synchrony in distributed systems , 1987, SOSP '87.

[13]  Louise E. Moser,et al.  Analyzing the latency of the Totem multicast protocols , 1997, Proceedings of Sixth International Conference on Computer Communications and Networks.

[14]  ZHANGLi-xia,et al.  A reliable multicast framework for light-weight sessions and application level framing , 1995 .

[15]  Yair Amir,et al.  Transis: a communication subsystem for high availability , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[16]  Todd Montgomery,et al.  A High Performance Totally Ordered Multicast Protocol , 1994, Dagstuhl Seminar on Distributed Systems.

[17]  Katherine Guo,et al.  Dynamic Light-Weight Groups , 1997, Proceedings of 17th International Conference on Distributed Computing Systems.

[18]  R. V. Renesse,et al.  Horus: a flexible group communication system , 1996, CACM.

[19]  Sanjoy Paul,et al.  RMTP: a reliable multicast transport protocol , 1996, Proceedings of IEEE INFOCOM '96. Conference on Computer Communications.

[20]  Louise E. Moser,et al.  The Totem multiple-ring ordering and topology maintenance protocol , 1998, TOCS.