Request Batching Self-Configuration in Byzantine Fault-Tolerant Replication

Replication techniques that tolerate byzantine failures have been applied in distributed computing to cope with hostile environments in which system components may fail due to malicious or natural causes (e.g., intrusions). From the seminal work of Lamport, Pease and Shostak on Byzantine Generals, in 1982, Castro and Liskov proposed in 1999 a successful solution, named PBFT, which overcomes performance drawbacks of previous ones, based on a number of protocol optimizations, including the use of request batching. Such a work motivated several other works as extension of the PBFT protocol, improving PBFT performance in certain computing environment conditions. In these solutions, which we call PBFT-family protocols, the tuning of the request batching parameters are realized in design time. However, such configuration may not yield the desired performance in dynamic distributed systems where the underlying characteristics change dynamically (e.g., workload, channel QoS, network topology, etc.). To answer to this challenge, this paper proposes an innovative solution to the dynamic configuration of batching parameters inspired on feedback control theory. In order to evaluate its efficiency, the proposed solution is simulated in various scenarios and compared with the original version used in the PBFT-family protocols.

[1]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[2]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[3]  Jason Flinn,et al.  Tolerating Latency in Replicated State Machines Through Client Speculation , 2009, NSDI.

[4]  Miguel Castro,et al.  Practical byzantine fault tolerance and proactive recovery , 2002, TOCS.

[5]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[6]  Raimundo José de Araújo Macêdo,et al.  A Self-Manageable Group Communication Protocol for Partially Synchronous Distributed Systems , 2011, 2011 5th Latin-American Symposium on Dependable Computing.

[7]  Michael K. Reiter,et al.  Dynamic byzantine quorum systems , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[8]  Yoram Moses,et al.  Fully polynomial Byzantine agreement in t + 1 rounds , 1993, STOC.

[9]  Michael K. Reiter,et al.  Unreliable intrusion detection in distributed computations , 1997, Proceedings 10th Computer Security Foundations Workshop.

[10]  Raimundo José de Araújo Macêdo,et al.  QoS self-configuring failure detectors for distributed systems , 2010, DAIS'10.

[11]  Silvio Micali,et al.  Optimal algorithms for Byzantine agreement , 1988, STOC '88.

[12]  Ramakrishna Kotla,et al.  Zyzzyva: speculative byzantine fault tolerance , 2007, TOCS.

[13]  Allen B. Downey,et al.  Evidence for long-tailed distributions in the internet , 2001, IMW '01.

[14]  Douglas M. Blough,et al.  A reconfigurable Byzantine quorum approach for the Agile Store , 2003, 22nd International Symposium on Reliable Distributed Systems, 2003. Proceedings..

[15]  Raj Jain,et al.  A delay-based approach for congestion avoidance in interconnected heterogeneous computer networks , 1989, CCRV.

[16]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.