Probabilistic Analysis of Disagreement in Synchronous Consensus Protocols ( Preliminary draft )

This report presents a probabilistic analysis of a family of simple synchronous round-based consensus algorithms aimed at solving the 1-of-n selection problem. In this problem, a set of n nodes are to select one common value among a set of n proposed values. There are two possible outcomes of each node’s selection process: it can decide either to select a value, or to abort. Agreement implies that all nodes select the same value, or all nodes decide to abort. We analyse this problem under the assumption of massive communication failures considering symmetric and asymmetric message losses. Previous research has shown that it is impossible to guarantee agreement among the nodes in a synchronous system subjected to an unbounded number of message losses. Our aim is to find algorithms for which the probability of disagreement is as low as possible. To this end, we study how the probability of disagreement varies for three different decision criteria, the optimistic, pessimistic and the moderately pessimistic. Our results show that that the probability of disagreement varies significantly with the number of nodes, the number of rounds, and the probability of message loss. In general, the optimistic decision criterion performs better (has a lower probability of disagreement) than the pessimistic one when the probability of message loss is less than 30% to 70%. On the other hand, the optimistic decision criterion has in general a higher maximum probability of disagreement compared to the pessimistic decision criterion. Moreover we show that the outcome of the moderately pessimistic decision criterion generally lies in between the two other decision criteria.

[1]  William H. Sanders,et al.  Probabilistic verification of a synchronous round-based consensus protocol , 1997, Proceedings of SRDS'97: 16th IEEE Symposium on Reliable Distributed Systems.

[2]  Marta Z. Kwiatkowska,et al.  PRISM: probabilistic model checking for performance and reliability analysis , 2009, PERV.

[3]  Nicola Santoro,et al.  Time is Not a Healer , 1989, STACS.

[4]  Rachid Guerraoui,et al.  The perfectly synchronized round-based model of distributed computing , 2007, Inf. Comput..

[5]  E. A. Akkoyunlu,et al.  Some constraints and tradeoffs in the design of network communications , 1975, SOSP.

[6]  Jim Gray,et al.  Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[7]  Seif Haridi,et al.  Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[8]  André Schiper,et al.  The Heard-Of model: computing in distributed systems with benign faults , 2009, Distributed Computing.

[9]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[10]  Idit Keidar,et al.  Impossibility Results and Lower Bounds for Consensus under Link Failures , 2008, SIAM J. Comput..

[11]  Ulrich Schmid How to model link failures: a perception-based fault model , 2001, 2001 International Conference on Dependable Systems and Networks.

[12]  Martin Biely,et al.  Synchronous consensus under hybrid process and link failures , 2011, Theor. Comput. Sci..

[13]  Nicola Santoro,et al.  Agreement in synchronous networks with ubiquitous faults , 2007, Theor. Comput. Sci..

[14]  Bernadette Charron-Bost,et al.  Agreement Problems in Fault-Tolerant Distributed Systems , 2001, SOFSEM.

[15]  Michel Raynal Consensus in synchronous systems: a concise guided tour , 2002, 2002 Pacific Rim International Symposium on Dependable Computing, 2002. Proceedings..

[16]  Nicola Santoro,et al.  Distributed Function Evaluation in the Presence of Transmission Faults , 1990, SIGAL International Symposium on Algorithms.

[17]  Miroslaw Malek,et al.  The consensus problem in fault-tolerant computing , 1993, CSUR.

[18]  Nancy A. Lynch,et al.  Reliable communication over unreliable channels , 1994, JACM.

[19]  S. Solyom,et al.  All aboard the robotic road train , 2012, IEEE Spectrum.

[20]  Michel Raynal,et al.  Group membership failure detection: a simple protocol and its probabilistic analysis , 1999, Distributed Syst. Eng..

[21]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.