Broadcasting messages in fault-tolerant distributed systems: the benefit of handling input-triggered and output-triggered suspicions differently

This paper investigates the two main and seemingly antagonistic approaches to broadcasting messages reliably in fault-tolerant distributed systems: the approach based on reliable broadcast, and that based on view synchronous communication (or VSC for short). While VSC does more than reliable broadcast, this has a cost. We show that this cost can be reduced by exploiting the difference between input-triggered and output-triggered suspicions, and by replacing the standard VSC broadcast primitive by two broadcast primitives, one sensitive to input-triggered suspicions, and the other sensitive to output-triggered suspicions.

[1]  André Schiper,et al.  Primary-backup replication: from a time-free protocol to a time-based implementation , 2001, Proceedings 20th IEEE Symposium on Reliable Distributed Systems.

[2]  Jo-Mei Chang,et al.  Reliable broadcast protocols , 1984, TOCS.

[3]  Richard D. Schlichting,et al.  Fault-Tolerant Broadcasts , 1984, Sci. Comput. Program..

[4]  Marcos K. Aguilera,et al.  Heartbeat: A Timeout-Free Failure Detector for Quiescent Reliable Communication , 1997, WDAG.

[5]  Idit Keidar,et al.  Group communication specifications: a comprehensive study , 2001, CSUR.

[6]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[7]  Rachid Guerraoui,et al.  The Generic Consensus Service , 2001, IEEE Trans. Software Eng..

[8]  André Schiper,et al.  Uniform reliable multicast in a virtually synchronous environment , 1993, [1993] Proceedings. The 13th International Conference on Distributed Computing Systems.

[9]  Fred B. Schneider,et al.  The primary-backup approach , 1993 .

[10]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[11]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[12]  Nancy A. Lynch,et al.  A dynamic view-oriented group communication service , 1998, PODC '98.

[13]  Aleta Ricciardi Impossibility of (repeated) reliable broadcast , 1996, PODC '96.

[14]  A. Schiper,et al.  View Synchronous Communication in the Internet , 1994 .

[15]  Bernadette Charron-Bost,et al.  On the impossibility of group membership , 1996, PODC '96.

[16]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[17]  Sam Toueg,et al.  Fault-tolerant broadcasts and related problems , 1993 .