BFT Protocols Under Fire

Much recent work on Byzantine state machine replication focuses on protocols with improved performance under benign conditions (LANs, homogeneous replicas, limited crash faults), with relatively little evaluation under typical, practical conditions (WAN delays, packet loss, transient disconnection, shared resources). This makes it difficult for system designers to choose the appropriate protocol for a real target deployment. Moreover, most protocol implementations differ in their choice of runtime environment, crypto library, and transport, hindering direct protocol comparisons even under similar conditions. We present a simulation environment for such protocols that combines a declarative networking system with a robust network simulator. Protocols can be rapidly implemented from pseudocode in the high-level declarative language of the former, while network conditions and (measured) costs of communication packages and crypto primitives can be plugged into the latter. We show that the resulting simulator faithfully predicts the performance of native protocol implementations, both as published and as measured in our local network. We use the simulator to compare representative protocols under identical conditions and rapidly explore the effects of changes in the costs of crypto operations, workloads, network conditions and faults. For example, we show that Zyzzyva outperforms protocols like PBFT and Q/U undermost but not all conditions, indicating that one-size-fits-all protocols may be hard if not impossible to design in practice.

[1]  Dejan Kostic,et al.  Scalability and accuracy in a large-scale network emulator , 2002, CCRV.

[2]  Ion Stoica,et al.  Implementing declarative overlays , 2005, SOSP '05.

[3]  Amin Vahdat,et al.  Mace: language support for building distributed systems , 2007, PLDI '07.

[4]  Miguel Castro,et al.  Practical byzantine fault tolerance and proactive recovery , 2002, TOCS.

[5]  Eddie Kohler,et al.  NetComplex: A Complexity Metric for Networked System Designs , 2008, NSDI.

[6]  Allen Clement Byzantine Fault Tolerance , 2010 .

[7]  Amin Vahdat,et al.  MACEDON: Methodology for Automatically Creating, Evaluating, and Designing Overlay Networks , 2004, NSDI.

[8]  Michael J. Schulte,et al.  A New Era of Performance Evaluation , 2007, Computer.

[9]  Michael K. Reiter,et al.  Low-overhead byzantine fault-tolerant storage , 2007, SOSP.

[10]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[11]  Barbara Liskov,et al.  Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems , 1999, PODC '88.

[12]  Keith Marzullo,et al.  Classic Paxos vs. fast Paxos: caveat emptor , 2007 .

[13]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[14]  Rui Guo,et al.  WiDS: An Integrated Toolkit for Distributed System Development , 2005, HotOS.

[15]  E. Kohler,et al.  A Complexity Metric for Networked System Designs , 2007 .

[16]  Michael K. Reiter,et al.  Fault-scalable Byzantine fault-tolerant services , 2005, SOSP '05.

[17]  Ramakrishna Kotla,et al.  Zyzzyva , 2007, SOSP.

[18]  Michael Dahlin,et al.  BAR fault tolerance for cooperative services , 2005, SOSP '05.

[19]  Barbara Liskov,et al.  Viewstamped Replication: A General Primary Copy , 1988, PODC.

[20]  Xiaoyun Wang,et al.  How to Break MD5 and Other Hash Functions , 2005, EUROCRYPT.

[21]  Liuba Shrira,et al.  HQ replication: a hybrid quorum protocol for byzantine fault tolerance , 2006, OSDI '06.

[22]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.