A review of experiences with reliable multicast

By understanding how real users have employed reliable multicast in real distributed systems, we can develop insight concerning the degree to which this technology has matched expectations. This paper reviews a number of applications with that goal in mind. Our findings point to tradeoffs between the form of reliability used by a system and its scalability and performance. We also find that to reach a broad user community (and a commercially interesting market) the technology must be better integrated with component and object-oriented systems architectures. Looking closely at these architectures, however, we identify some assumptions about failure handling which make reliable multicast difficult to exploit. Indeed, the major failures of reliable multicast are associated with attempts to position it within object oriented systems in ways that focus on transparent recovery from server failures. The broader opportunity appears to involve relatively visible embeddings of these tools into object-oriented architectures enabling knowledgeable users to make tradeoffs. Fault-tolerance through transparent server replication may be better viewed as an unachievable holy grail.

[1]  Willy Zwaenepoel,et al.  Distributed process groups in the V Kernel , 1985, TOCS.

[2]  Flaviu Cristian,et al.  Fault-tolerance in air traffic control systems , 1996, TOCS.

[3]  Mark Garland Hayden,et al.  The Ensemble System , 1998 .

[4]  Eric A. Brewer,et al.  Cluster-based scalable network services , 1997, SOSP.

[5]  Kenneth P. Birman,et al.  Building reliable interoperable distributed objects with the maestro tools , 1998 .

[6]  Dale Skeen,et al.  The Information Bus: an architecture for extensible distributed systems , 1994, SOSP '93.

[7]  Kenneth P. Birman,et al.  Scalable message stability detection protocols , 1998 .

[8]  Robbert van Renesse,et al.  Horus: a flexible group communication system , 1996, CACM.

[9]  Kees Verstoep,et al.  Group communication in Amoeba and its applications , 1993, Distributed Syst. Eng..

[10]  Matthew Thomas Lucas,et al.  Efficient data distribution in large-scale multicast networks , 1998 .

[11]  Hermann Kopetz,et al.  Real-time systems , 2018, CSC '73.

[12]  Eric C. Cooper Replicated distributed programs , 1985, SOSP 1985.

[13]  Rico Piantoni,et al.  Implementing the Swiss Exchange trading system , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[14]  Kenneth P. Birman,et al.  Exploiting virtual synchrony in distributed systems , 1987, SOSP '87.

[15]  R. V. Renesse,et al.  Software for Reliable Networks , 1996 .

[16]  Kenneth P. Birman,et al.  Building Secure and Reliable Network Applications , 1996 .

[17]  Gregory F. Pfister,et al.  In Search of Clusters , 1995 .

[18]  Robbert van Renesse,et al.  Reliable Distributed Computing with the Isis Toolkit , 1994 .

[19]  Robbert van Renesse,et al.  Six misconceptions about reliable distributed computing , 1998, EW 8.

[20]  Amin Vahdat,et al.  GLUix: a global layer unix for a network of workstations , 1998 .

[21]  Priya Narasimhan,et al.  Replica consistency of CORBA objects in partitionable distributed systems , 1997, Distributed Syst. Eng..

[22]  Kenneth P. Birman,et al.  The design and architecture of the Microsoft Cluster Service-a practical approach to high-availability and scalability , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[23]  Roy Friedman,et al.  Using Group Communication Technology to Implement a Reliable andScalable Distributed IN Coprocessor , 1996 .

[24]  Nancy P. Kronenberg,et al.  VAXcluster: a closely-coupled distributed system , 1986, TOCS.

[25]  Bradford B. Glade A scalable architecture for reliable publish/subscribe communication in distributed systems , 1998 .

[26]  Robbert van Renesse Masking the overhead of protocol layering , 1996, SIGCOMM 1996.

[27]  Sanjoy Paul,et al.  Reliable Multicast Transport Protocol (RMTP) , 1997, IEEE J. Sel. Areas Commun..