Semantically reliable group communication

Current usage of computers and data communication networks for a variety of daily tasks, calls for widespread deployment of fault tolerance techniques with inexpensive off-the-shelf hardware and software. Group communication is in this context a particularly appealing technology, as it provides to the application programmer reliability guarantees that highly simplify many fault tolerance techniques. It has however been reported that the performance of group communication toolkits in large and heterogeneous systems is frequently disappointing. Although this can be overcome by relaxing reliability guarantees, the resulting protocol is often much less useful than group communication, in particular, for strong consistent replication. The challenge is thus to relax reliability and still provide a convenient set of guarantees for fault tolerant programming. This thesis addresses models and mechanisms that by selectively relaxing reliability guarantees, offer both the convenience of group communication for fault tolerant programming and high performance. The key to our proposal is to use knowledge about the semantics of messages exchanged to determine which messages need to be reliably delivered, hence semantic reliability. In many applications, some messages implicitly convey or overwrite other messages sent recently before, making them obsolete while still in transit. By omitting only the delivery of obsolete messages, performance can be improved without impact on the correctness of the application. Specifications and algorithms for a complete semantically reliable group communication protocol suite are introduced, encompassing ordered and view synchronous multicast. The protocols are then evaluated with analytical and simulation models and with a prototype implementation. The discussion of a concrete application illustrates the resulting programming interface and performance.

[1]  Newtop: a fault-tolerant group communication protocol , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[2]  Sam Toueg,et al.  A Modular Approach to Fault-Tolerant Broadcasts and Related Problems , 1994 .

[3]  Fred B. Schneider,et al.  The primary-backup approach , 1993 .

[4]  David D. Clark,et al.  Architectural considerations for a new generation of protocols , 1990, SIGCOMM '90.

[5]  Mark Garland Hayden,et al.  The Ensemble System , 1998 .

[6]  Stephen E. Deering,et al.  Multicast routing in datagram internetworks and extended LANs , 1990, TOCS.

[7]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1992, PODC '92.

[8]  Mukesh Singhal,et al.  Efficient Δ-causal broadcasting , 1998 .

[9]  Robbert van Renesse,et al.  GSGC: An Efficient Gossip-Style Garbage Collection Scheme for Scalable Reliable Multicast , 1997 .

[10]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[11]  Fernando Pedone The database state machine and group communication issues , 1999 .

[12]  Idit Keidar,et al.  Optimistic Virtual Synchrony , 2000, Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000.

[13]  André Schiper,et al.  Lightweight causal and atomic group multicast , 1991, TOCS.

[14]  André Schiper,et al.  A hierarchy of totally ordered multicasts , 1995, Proceedings. 14th Symposium on Reliable Distributed Systems.

[15]  ZHANGLi-xia,et al.  A reliable multicast framework for light-weight sessions and application level framing , 1995 .

[16]  Paulo Veríssimo,et al.  xAMp: a multi-primitive group communications service , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.

[17]  Yair Amir,et al.  A low latency, loss tolerant architecture and protocol for wide area group communication , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[18]  Vassos Hadzilacos,et al.  Asynchronous Group Membership with Oracles , 1999, DISC.

[19]  Leslie Lamport,et al.  Processes are in the Eye of the Beholder , 1997, Theor. Comput. Sci..

[20]  Roy Friedman,et al.  Trading Consistency for Availability in Distributed Systems , 1996 .

[21]  Peter Parnes,et al.  A Literature Review of Recent Developments in Reliable Multicast Error Handling , 2001 .

[22]  Anne-Marie Kermarrec,et al.  Probabilistic semantically reliable multicast , 2001, Proceedings IEEE International Symposium on Network Computing and Applications. NCA 2001.

[23]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[24]  Idit Keidar,et al.  Group communication specifications: a comprehensive study , 2001, CSUR.

[25]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[26]  LamportLeslie Time, clocks, and the ordering of events in a distributed system , 1978 .

[27]  Suchitra Raman,et al.  Generalized Data Naming and Scalable State Announcements for Reliable , 1997 .

[28]  Andrea C. Arpaci-Dusseau,et al.  Fail-stutter fault tolerance , 2001, Proceedings Eighth Workshop on Hot Topics in Operating Systems.

[29]  Ray Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[30]  Leslie Lamport,et al.  The temporal logic of actions , 1994, TOPL.

[31]  Louise E. Moser,et al.  The Totem single-ring ordering and membership protocol , 1995, TOCS.

[32]  David D. Clark,et al.  Window and Acknowledgement Strategy in TCP , 1982, RFC.

[33]  David S. Rosenblum,et al.  Content-Based Addressing and Routing: A General Model and its Application , 2000 .

[34]  Sam Toueg,et al.  Inconsistency and contamination (preliminary version) , 1991, PODC '91.

[35]  V. Jacobson,et al.  Congestion avoidance and control , 1988, CCRV.

[36]  Flaviu Cristian,et al.  The Timed Asynchronous Distributed System Model , 1999, IEEE Trans. Parallel Distributed Syst..

[37]  Kenneth P. Birman,et al.  Scalable message stability detection protocols , 1998 .

[38]  Flaviu Cristian,et al.  Understanding fault-tolerant distributed systems , 1991, CACM.

[39]  Matti A. Hiltunen,et al.  Properties of membership services , 1995, Proceedings ISADS 95. Second International Symposium on Autonomous Decentralized Systems.

[40]  Ashok Erramilli,et al.  A reliable and efficient multicast for broadband broadcast networks , 1987, Computer Communication Review.

[41]  Christoph Peter Malloth,et al.  Conception and implementation of a toolkit for building fault-tolerant distributed applications in large scale networks , 1996 .

[42]  Douglas C. Schmidt,et al.  ADAPTIVE: A dynamically assembled protocol transformation, integration and evaluation environment , 1993, Concurr. Pract. Exp..

[43]  André Schiper,et al.  Stubborn Communication Channels , 1998 .

[44]  Roy Friedman,et al.  Strong and weak virtual synchrony in Horus , 1996, Proceedings 15th Symposium on Reliable Distributed Systems.

[45]  Rachid Guerraoui,et al.  The Database State Machine Approach , 2003, Distributed and Parallel Databases.

[46]  Rachid Guerraoui,et al.  The Generic Consensus Service , 2001, IEEE Trans. Software Eng..

[47]  Kenneth P. Birman,et al.  A review of experiences with reliable multicast , 1999, Softw. Pract. Exp..

[48]  Peter J. Denning,et al.  The Working Set Model for Program Behaviour (Reprint). , 1983 .

[49]  Rachid Guerraoui,et al.  Total order multicast to multiple groups , 1997, Proceedings of 17th International Conference on Distributed Computing Systems.

[50]  André Schiper,et al.  Generic Broadcast , 1999, DISC.

[51]  Mahadev Satyanarayanan,et al.  A SURVEY OF DISTRIBUTED FILE SYSTEMS , 1990 .

[52]  Donald F. Towsley,et al.  Parity-based loss recovery for reliable multicast transmission , 1997, TNET.

[53]  Rico Piantoni,et al.  Implementing the Swiss Exchange trading system , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[54]  André Schiper,et al.  Uniform reliable multicast in a virtually synchronous environment , 1993, [1993] Proceedings. The 13th International Conference on Distributed Computing Systems.

[55]  Rachid Guerraoui,et al.  Software-Based Replication for Fault Tolerance , 1997, Computer.

[56]  M. Raynal,et al.  How to find his way in the jungle of consistency criteria for distributed shared memories (or how to escape from Minos' labyrinth) , 1993, 1993 4th Workshop on Future Trends of Distributed Computing Systems.

[57]  Patrick Th. Eugster,et al.  Probabilistic multicast , 2002, Proceedings International Conference on Dependable Systems and Networks.

[58]  André Schiper,et al.  Understanding the Power of the Virtually-Synchronous Model , 1993 .

[59]  Fred B. Schneider,et al.  Replication management using the state-machine approach , 1993 .

[60]  Yair Amir,et al.  Transis: A Communication Sub-system for High Availability , 1992 .

[61]  André Schiper,et al.  Scalable atomic multicast , 1998, Proceedings 7th International Conference on Computer Communications and Networks (Cat. No.98EX226).

[62]  Robbert van Renesse,et al.  Reliable Distributed Computing with the Isis Toolkit , 1994 .

[63]  Peter Parnes,et al.  A configurable transport layer as a cure for crying babies , 2001 .

[64]  André Schiper,et al.  Primary-backup replication: from a time-free protocol to a time-based implementation , 2001, Proceedings 20th IEEE Symposium on Reliable Distributed Systems.

[65]  K. Birman,et al.  Understanding Partitions and the \ No Partition " , 1993 .

[66]  Peter Parnes,et al.  Applying semantic reliability concepts to multicast information messaging in wireless networks , 2002 .

[67]  Kenneth P. Birman,et al.  The process group approach to reliable distributed computing , 1992, CACM.

[68]  Anne-Marie Kermarrec,et al.  Lightweight probabilistic broadcast , 2003, TOCS.

[69]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[70]  Xavier Défago,et al.  Time vs. space in fault-tolerant distributed systems , 2001, Proceedings Sixth International Workshop on Object-Oriented Real-Time Dependable Systems.

[71]  Alberto Montresor,et al.  Group Communication in Partitionable Systems: Specification and Algorithms , 2001, IEEE Trans. Software Eng..

[72]  Robbert van Renesse,et al.  Incorporating System Resource Information into Flow Control , 1995 .

[73]  Flaviu Cristian,et al.  Applying simulation to the design and performance evaluation of fault-tolerant systems , 1997, Proceedings of SRDS'97: 16th IEEE Symposium on Reliable Distributed Systems.

[74]  Donald F. Towsley,et al.  A comparison of sender-initiated and receiver-initiated reliable multicast protocols , 1994, IEEE J. Sel. Areas Commun..

[75]  Michel Raynal,et al.  Deadline-constrained causal order , 2000, Proceedings Third IEEE International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC 2000) (Cat. No. PR00607).

[76]  Bernadette Charron-Bost,et al.  Simulating Reliable Links with Unreliable Links in the Presence of Process Crashes , 1996, WDAG.