Quantitative Evaluation of BFT Protocols

Byzantine Fault Tolerant (BFT) protocols aim to improve the reliability of distributed systems. They enable systems to tolerate arbitrary failures in a bounded number of nodes. BFT protocols are usually proven correct for certain safety and liveness properties. However, recent studies have shown that the performance of state-of-the-art BFT protocols decreases drastically in the presence of even a single malicious node. This motivates a formal quantitative analysis of BFT protocols to investigate their performance characteristics under different scenarios. We present HyPerf, a new hybrid methodology based on model checking and simulation techniques for evaluating the performance of BFT protocols. We build a transition system corresponding to a BFT protocol and systematically explore the set of behaviors allowed by the protocol. We associate certain timing information with different operations in the protocol, like cryptographic operations and message transmission. After an elaborate state exploration, we use the time information to evaluate the performance characteristics of the protocol using simulation techniques. We integrate our framework in Mace, a tool for building and verifying distributed systems. We evaluate the performance of PBFT using our framework. We describe two different use-cases of our methodology. For the benign operation of the protocol, we use the time information as random variables to compute the probability distribution of the execution times. In the presence of faults, we estimate the worst-case performance of the protocol for various attacks that can be employed by malicious nodes. Our results show the importance of hybrid techniques in systematically analyzing the performance of large-scale systems.

[1]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[2]  Stephen Gilmore,et al.  The PEPA Workbench: A Tool to Support a Process Algebra-based Approach to Performance Modelling , 1994, Computer Performance Evaluation.

[3]  Marta Z. Kwiatkowska,et al.  Automated Verification of a Randomized Distributed Consensus Protocol Using Cadence SMV and PRISM , 2001, CAV.

[4]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[5]  Edmund L. Wong,et al.  BFT: the time is now , 2008, LADIS '08.

[6]  M. Diaz,et al.  Modeling and Verification of Time Dependent Systems Using Time Petri Nets , 1991, IEEE Trans. Software Eng..

[7]  Amin Vahdat,et al.  Life, death, and the critical transition: finding liveness bugs in systems code , 2007 .

[8]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[9]  Koushik Sen,et al.  Concolic testing , 2007, ASE.

[10]  James L. Peterson,et al.  Petri Nets , 1977, CSUR.

[11]  Koushik Sen DART: Directed Automated Random Testing , 2009, Haifa Verification Conference.

[12]  Wei Zhang,et al.  Modeling End-to-End Delay Using Pareto Distribution , 2007, Second International Conference on Internet Monitoring and Protection (ICIMP 2007).

[13]  Jane Hillston,et al.  A compositional approach to performance modelling , 1996 .

[14]  Marta Z. Kwiatkowska,et al.  PRISM: Probabilistic Symbolic Model Checker , 2002, Computer Performance Evaluation / TOOLS.

[15]  Michael K. Molloy Performance Analysis Using Stochastic Petri Nets , 1982, IEEE Transactions on Computers.

[16]  Michael Dahlin,et al.  Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults , 2009, NSDI.

[17]  Amin Vahdat,et al.  Mace: language support for building distributed systems , 2007, PLDI '07.

[18]  Michael K. Reiter,et al.  Fault-scalable Byzantine fault-tolerant services , 2005, SOSP '05.

[19]  Atul Singh,et al.  BFT Protocols Under Fire , 2008, NSDI.

[20]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[21]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[22]  Ramakrishna Kotla,et al.  Zyzzyva: speculative byzantine fault tolerance , 2007, TOCS.

[23]  Marta Z. Kwiatkowska,et al.  Verifying Randomized Byzantine Agreement , 2002, FORTE.

[24]  Liuba Shrira,et al.  HQ replication: a hybrid quorum protocol for byzantine fault tolerance , 2006, OSDI '06.