Group Communication in Partitionable Systems: Specification and Algorithms

Gives a formal specification and an implementation for a partitionable group communication service in asynchronous distributed systems. Our specification is motivated by the requirements for building "partition-aware" applications that can continue operating without blocking in multiple concurrent partitions and can reconfigure themselves dynamically when partitions merge. The specified service guarantees liveness and excludes trivial solutions, it constitutes a useful basis for building realistic partition-aware applications, and it is implementable in practical asynchronous distributed systems where certain stability conditions hold.

[1]  Louise E. Moser,et al.  Extended virtual synchrony , 1994, 14th International Conference on Distributed Computing Systems.

[2]  Idit Keidar,et al.  Scalable group membership services for novel applications , 1997, Networks in Distributed Computing.

[3]  André Schiper,et al.  Virtually-synchronous communication based on a weak failure suspector , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[4]  Roberto Gorrieri,et al.  Comparing three semantics for Linda-like languages , 2000, Theor. Comput. Sci..

[5]  Fabio Vitali,et al.  Managing Complex Documents Over the WWW: A Case Study for XML , 1999, IEEE Trans. Knowl. Data Eng..

[6]  Nancy A. Lynch,et al.  Multicast Group Communication as a Base for a Load-Balancing Replicated Data Service , 1998, DISC.

[7]  Danny Dolev,et al.  The Transis approach to high availability cluster communication , 1996, CACM.

[8]  Marco Roccetti,et al.  Formal Performance Modelling and Evaluation of an Adaptive Mechanism for Packetised Audio over the Internet , 1998, Formal Aspects of Computing.

[9]  Mark Garland Hayden,et al.  The Ensemble System , 1998 .

[10]  Gil Neiger A new look at membership services (extended abstract) , 1996, PODC '96.

[11]  Nancy A. Lynch,et al.  A dynamic view-oriented group communication service , 1998, PODC '98.

[12]  Louise E. Moser,et al.  Totem: a fault-tolerant multicast group communication system , 1996, CACM.

[13]  David R. Cheriton,et al.  Understanding the limitations of causally and totally ordered communication , 1994, SOSP '93.

[14]  Alberto Montresor,et al.  System support for partition-aware network applications , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[15]  Robbert van Renesse,et al.  Reliable Distributed Computing with the Isis Toolkit , 1994 .

[16]  Özalp Babaoglu,et al.  The Inherent Cost of Strong-Partial View-Synchronous Communication , 1995, WDAG.

[17]  Roberto Gorrieri,et al.  An Algebraic Model for Evaluating the Performance of an ATM Switch with Explicit Rate Marking , 1999 .

[18]  Robbert van Renesse,et al.  Horus: a flexible group communication system , 1996, CACM.

[19]  Emmanuelle Anceaume,et al.  On the Formal Specification of Group Membership Services , 1994 .

[20]  Bernadette Charron-Bost,et al.  On the impossibility of group membership , 1996, PODC '96.

[21]  Danny Dolev,et al.  A framework for partitionable membership service , 1996, PODC '96.

[22]  Kenneth P. Birman,et al.  The process group approach to reliable distributed computing , 1992, CACM.

[23]  Andre Schiper,et al.  View Synchronous Communication in Large Scale Networks , 1995 .

[24]  Roy Friedman,et al.  Failure detectors in omission failure environments , 1997, PODC '97.

[25]  Marco Roccetti,et al.  Client-centered load distribution: a mechanism for constructing responsive Web services , 2001, Proceedings of the 34th Annual Hawaii International Conference on System Sciences.

[26]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[27]  Michele Finelli,et al.  A Simple Game Semantics Model of Concurrency , 1999 .

[28]  Kenneth P. Birman,et al.  Using process groups to implement failure detection in asynchronous environments , 1991, PODC '91.

[29]  Newtop: a fault-tolerant group communication protocol , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[30]  Alberto Montresor,et al.  The Jgroup distributed object model , 1999, DAIS.

[31]  Alberto Montresor,et al.  The Jgroup Reliable Distributed Object Model , 1999 .

[32]  Roberto Gorrieri,et al.  A Truly Concurrent View of Linda Interprocess Communication , 1997 .

[33]  Alberto Montresor,et al.  A Reliable Registry for the Jgroup Distributed Object Model , 1999 .

[34]  Christoph Peter Malloth,et al.  Conception and implementation of a toolkit for building fault-tolerant distributed applications in large scale networks , 1996 .

[35]  Nancy A. Lynch,et al.  Specifying and using a partitionable group communication service , 2001, TOCS.

[36]  M. Bernardo,et al.  A Theory of Efficiency for Markovian Processes , 1999 .

[37]  Bradford B. Glade,et al.  The Horus System , 1993 .

[38]  Louise E. Moser,et al.  The Totem single-ring ordering and membership protocol , 1995, TOCS.

[39]  Andrew S. Tanenbaum,et al.  Group communication in the Amoeba distributed operating system , 1991, [1991] Proceedings. 11th International Conference on Distributed Computing Systems.

[40]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[41]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[42]  Özalp Babaoglu,et al.  RELACS: A communications infrastructure for constructing reliable applications in large-scale distributed systems , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[43]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[44]  Roy Friedman,et al.  Strong and weak virtual synchrony in Horus , 1996, Proceedings 15th Symposium on Reliable Distributed Systems.