Run-Time Switching Between Total Order Algorithms

Total order broadcast protocols are a fundamental building block in the construction of many fault-tolerant distributed applications. Unfortunately, total order is an intrinsically expensive operation. Moreover, there are certain algorithms that perform better in specific scenarios and given network properties. This paper proposes and evaluates an adaptive protocol that is able to dynamically switch between different total order algorithms. The protocol allows to achieve the best possible performance, by selecting, in each moment, the algorithm that is most appropriate to the present network conditions. Experimental results show that, using our protocol, adaptation can be achieved with negligible interference with the data flow.

[1]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[2]  Luís E. T. Rodrigues,et al.  Appia, a flexible protocol kernel supporting multiple coordinated channels , 2001, Proceedings 21st International Conference on Distributed Computing Systems.

[3]  Ozalp Babaoglu,et al.  ACM Transactions on Computer Systems , 2007 .

[4]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[5]  Guanhua Yan,et al.  Simulation of large scale networks using SSF , 2003, Proceedings of the 2003 Winter Simulation Conference, 2003..

[6]  Paulo Veríssimo,et al.  Totally ordered multicast in large-scale systems , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[7]  David M. Nicol,et al.  Simulation of large scale networks I: simulation of large-scale networks using SSF , 2003, WSC '03.

[8]  Jo-Mei Chang,et al.  Reliable broadcast protocols , 1984, TOCS.

[9]  André Schiper,et al.  Structural and algorithmic issues of dynamic protocol update , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[10]  Andrew S. Tanenbaum,et al.  Group communication in the Amoeba distributed operating system , 1991, [1991] Proceedings. 11th International Conference on Distributed Computing Systems.

[11]  Richard D. Schlichting,et al.  The Cactus Approach to Building Configurable Middleware Services , 2000 .

[12]  Robbert van Renesse,et al.  Fast protocol transition in a distributed environment (brief announcement) , 2000, PODC '00.

[13]  Francisco Moura,et al.  Optimistic total order in wide area networks , 2002, 21st IEEE Symposium on Reliable Distributed Systems, 2002. Proceedings..

[14]  Kenneth P. Birman,et al.  Exploiting virtual synchrony in distributed systems , 1987, SOSP '87.

[15]  Rachid Guerraoui,et al.  Software-Based Replication for Fault Tolerance , 1997, Computer.

[16]  David Powell,et al.  Group communication , 1996, CACM.

[17]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[18]  Richard D. Schlichting,et al.  Preserving and using context information in interprocess communication , 1989, TOCS.

[19]  Mark Garland Hayden,et al.  The Ensemble System , 1998 .

[20]  Danny Dolev,et al.  Early delivery totally ordered multicast in asynchronous environments , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[21]  Luís E. T. Rodrigues,et al.  From spontaneous total order to uniform total order: different degrees of optimistic delivery , 2006, SAC '06.

[22]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[23]  Matti A. Hiltunen,et al.  Constructing adaptive software in distributed systems , 2001, Proceedings 21st International Conference on Distributed Computing Systems.

[24]  Robbert van Renesse,et al.  Building adaptive systems using ensemble , 1998 .