Zyzzyva

A longstanding vision in distributed systems is to build reliable systems from unreliable components. An enticing formulation of this vision is Byzantine fault-tolerant (BFT) state machine replication, in which a group of servers collectively act as a correct server even if some of the servers misbehave or malfunction in arbitrary ("Byzantine") ways. Despite this promise, practitioners hesitate to deploy BFT systems at least partly because of the perception that BFT must impose high overheads. In this article, we present Zyzzyva, a protocol that uses speculation to reduce the cost of BFT replication. In Zyzzyva, replicas reply to a client's request without first running an expensive three-phase commit protocol to agree on the order to process requests. Instead, they optimistically adopt the order proposed by a primary server, process the request, and reply immediately to the client. If the primary is faulty, replicas can become temporarily inconsistent with one another, but clients detect inconsistencies, help correct replicas converge on a single total ordering of requests, and only rely on responses that are consistent with this total order. This approach allows Zyzzyva to reduce replication overheads to near their theoretical minima and to achieve throughputs of tens of thousands of requests per second, making BFT replication practical for a broad range of demanding services.

[1]  LamportLeslie Using Time Instead of Timeout for Fault-Tolerant Distributed Systems. , 1984 .

[2]  Michael Williams,et al.  Replication in the harp file system , 1991, SOSP '91.

[3]  GhemawatSanjay,et al.  The Google file system , 2003 .

[4]  Ramakrishna Kotla,et al.  SafeStore: A Durable and Practical Storage System , 2007, USENIX Annual Technical Conference.

[5]  Leslie Lamport,et al.  Using Time Instead of Timeout for Fault-Tolerant Distributed Systems. , 1984, TOPL.

[6]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[7]  Andrea C. Arpaci-Dusseau,et al.  IRON file systems , 2005, SOSP '05.

[8]  Leslie Lamport,et al.  Reaching Agreement in the Presence of Faults , 1980, JACM.

[9]  Junfeng Yang,et al.  Using model checking to find serious file system errors , 2004, TOCS.

[10]  Michael K. Reiter,et al.  Fault-scalable Byzantine fault-tolerant services , 2005, SOSP '05.

[11]  Joel S. Emer,et al.  The soft error problem: an architectural perspective , 2005, 11th International Symposium on High-Performance Computer Architecture.

[12]  Jason Flinn,et al.  Speculative execution in a distributed file system , 2005, SOSP '05.

[13]  Leslie Lamport,et al.  Lower bounds for asynchronous consensus , 2006, Distributed Computing.

[14]  Jason Flinn,et al.  Rethink the sync , 2006, OSDI '06.

[15]  Nancy G. Leveson,et al.  An experimental evaluation of the assumption of independence in multiversion programming , 1986, IEEE Transactions on Software Engineering.

[16]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[17]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[18]  Gustavo Alonso,et al.  Understanding replication in databases and distributed systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[19]  William I. Nowicki,et al.  NFS: Network File System Protocol specification , 1989, RFC.

[20]  Michael Dahlin,et al.  BAR fault tolerance for cooperative services , 2005, SOSP '05.

[21]  Arun Venkataramani,et al.  Separating agreement from execution for byzantine fault tolerant services , 2003, SOSP '03.

[22]  Miguel Castro,et al.  Practical byzantine fault tolerance and proactive recovery , 2002, TOCS.

[23]  Jean-Philippe Martin,et al.  Fast Byzantine Consensus , 2006, IEEE Transactions on Dependable and Secure Computing.

[24]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[25]  Ramakrishna Kotla,et al.  Zyzzyva: speculative byzantine fault tolerance , 2007, TOCS.

[26]  Miguel Castro,et al.  BASE: Using abstraction to improve fault tolerance , 2003, TOCS.

[27]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[28]  Lisa Spainhower,et al.  Commercial fault tolerance: a tale of two systems , 2004, IEEE Transactions on Dependable and Secure Computing.

[29]  Mihir Bellare,et al.  A New Paradigm for Collision-Free Hashing: Incrementality at Reduced Cost , 1997, EUROCRYPT.

[30]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[31]  Samuel T. King,et al.  Backtracking intrusions , 2003, SOSP '03.

[32]  Liuba Shrira,et al.  HQ replication: a hybrid quorum protocol for byzantine fault tolerance , 2006, OSDI '06.

[33]  Michael K. Reiter,et al.  Byzantine quorum systems , 1997, STOC '97.

[34]  Junfeng Yang,et al.  EXPLODE: a lightweight, general system for finding serious storage system errors , 2006, OSDI '06.

[35]  R. Guerraoui,et al.  Best-Case Complexity of Asynchronous Byzantine Consensus , 2005 .

[36]  Norman C. Hutchinson,et al.  Deciding when to forget in the Elephant file system , 1999, SOSP.

[37]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[38]  DahlinMike,et al.  Separating agreement from execution for byzantine fault tolerant services , 2003 .

[39]  Ramakrishna Kotla,et al.  High throughput Byzantine fault tolerance , 2004, International Conference on Dependable Systems and Networks, 2004.

[40]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[41]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[42]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.