On the Performance of a Retransmission-Based Synchronizer

Designing algorithms for distributed systems that provide a round abstraction is often simpler than designing for those that do not provide such an abstraction. Further, distributed systems need to tolerate various kinds of failures. The concept of a synchronizer deals with both: It constructs rounds and allows masking of transmission failures. One simple way of dealing with transmission failures is to retransmit a message until it is known that the message was successfully received. We calculate the exact value of the average rate of a retransmission-based synchronizer in environments with probabilistic message loss, within which the synchronizer shows nontrivial timing behavior. We show how to make this calculation efficient, and present analytical results on the convergence speed. The theoretic results, based on Markov theory, are backed up with Monte Carlo simulations.

[1]  F. Baccelli,et al.  Analytic expansions of max-plus Lyapunov exponents , 2000 .

[2]  Michael Ben-Or,et al.  Another advantage of free choice (Extended Abstract): Completely asynchronous agreement protocols , 1983, PODC '83.

[3]  Sergio Rajsbaum Upper and Lower Bounds for Stochastic Marked Graphs , 1994, Inf. Process. Lett..

[4]  Michael Ben-Or,et al.  Another advantage of free choice (Extended Abstract): Completely asynchronous agreement protocols , 1983, PODC '83.

[5]  Gil Neiger,et al.  Automatically Increasing the Fault-Tolerance of Distributed Algorithms , 1990, J. Algorithms.

[6]  Seif Haridi,et al.  Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[7]  Hagit Attiya,et al.  Distributed Computing: Fundamentals, Simulations and Advanced Topics , 1998 .

[8]  G. Olsder,et al.  Asymptotic behavior of random discrete event systems , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[9]  B. Heidergott Max-plus linear stochastic systems and perturbation analysis , 2006 .

[10]  Butler W. Lampson,et al.  How to Build a Highly Available System Using Consensus , 1996, WDAG.

[11]  Sam Toueg,et al.  Asynchronous consensus and broadcast protocols , 1985, JACM.

[12]  Matthias Függer,et al.  On the performance of a retransmission-based synchronizer , 2013, Theor. Comput. Sci..

[13]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[14]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[15]  Idit Keidar,et al.  Evaluating the running time of a communication round over the internet , 2002, PODC '02.

[16]  Leslie Lamport,et al.  The Implementation of Reliable Distributed Multiprocess Systems , 1978, Comput. Networks.

[17]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[18]  Marcin Paprzycki,et al.  Distributed Computing: Fundamentals, Simulations and Advanced Topics , 2001, Scalable Comput. Pract. Exp..

[19]  André Schiper,et al.  The Heard-Of model: computing in distributed systems with benign faults , 2009, Distributed Computing.

[20]  Moshe Sidi,et al.  On the Performance of Synchronized Programs in Distributed Networks with Random Processing Times and Transmission Delays , 1994, IEEE Trans. Parallel Distributed Syst..

[21]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[22]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[23]  Baruch Awerbuch,et al.  Complexity of network synchronization , 1985, JACM.