Zeno: Eventually Consistent Byzantine-Fault Tolerance

Many distributed services are hosted at large, shared, geographically diverse data centers, and they use replication to achieve high availability despite the unreachability of an entire data center. Recent events show that non-crash faults occur in these services and may lead to long outages. While Byzantine-Fault Tolerance (BFT) could be used to withstand these faults, current BFT protocols can become unavailable if a small fraction of their replicas are unreachable. This is because existing BFT protocols favor strong safety guarantees (consistency) over liveness (availability). This paper presents a novel BFT state machine replication protocol called Zeno that trades consistency for higher availability. In particular, Zeno replaces strong consistency (linearizability) with a weaker guarantee (eventual consistency): clients can temporarily miss each other's updates but when the network is stable the states from the individual partitions are merged by having the replicas agree on a total order for all requests. We have built a prototype of Zeno and our evaluation using micro-benchmarks shows that Zeno provides better availability than traditional BFT protocols.

[1]  Nancy A. Lynch,et al.  Distributed Algorithms , 1994, Lecture Notes in Computer Science.

[2]  David Mazières,et al.  Beyond One-Third Faulty Replicas in Byzantine Fault Tolerant Systems , 2007, NSDI.

[3]  Michael K. Reiter,et al.  Fault-scalable Byzantine fault-tolerant services , 2005, SOSP '05.

[4]  Hari Balakrishnan,et al.  Tolerating byzantine faults in transaction processing systems using commit barrier scheduling , 2007, SOSP.

[5]  Arun Venkataramani,et al.  Separating agreement from execution for byzantine fault tolerant services , 2003, SOSP '03.

[6]  Michael K. Reiter,et al.  Byzantine quorum systems , 1997, STOC '97.

[7]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[8]  Michael Dahlin,et al.  End-to-end WAN service availability , 2001, TNET.

[9]  Eric A. Brewer,et al.  Towards robust distributed systems (abstract) , 2000, PODC '00.

[10]  A. Aiken,et al.  EXPLODE : A Lightweight , General Approach to Finding Serious Errors in Storage Systems , 2005 .

[11]  Ramakrishna Kotla,et al.  Zyzzyva , 2007, SOSP.

[12]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[13]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[14]  Marvin Theimer,et al.  Managing update conflicts in Bayou, a weakly connected replicated storage system , 1995, SOSP.

[15]  Yasushi Saito,et al.  Optimistic replication , 2005, CSUR.

[16]  Marvin Theimer,et al.  Dealing with server corruption in weakly consistent, replicated data systems , 1997, MobiCom '97.

[17]  Web Team Internal server error , 2006 .

[18]  Allen Clement Byzantine Fault Tolerance , 2010 .

[19]  Nancy A. Lynch,et al.  Eventually-serializable data services , 1996, PODC '96.

[20]  Scott Shenker,et al.  Attested append-only memory: making adversaries stick to their word , 2007, SOSP.

[21]  Amin Vahdat,et al.  Design and evaluation of a conit-based continuous consistency model for replicated services , 2002, TOCS.

[22]  Dennis Shasha,et al.  Secure Untrusted Data Repository (SUNDR) , 2004, OSDI.

[23]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[24]  Dejan Kostic,et al.  Scalability and accuracy in a large-scale network emulator , 2002, CCRV.

[25]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[26]  Miguel Castro,et al.  BASE: Using abstraction to improve fault tolerance , 2003, TOCS.

[27]  Miguel Castro,et al.  Using abstraction to improve fault tolerance , 2001, Proceedings Eighth Workshop on Hot Topics in Operating Systems.

[28]  Miguel Castro,et al.  BASE: using abstraction to improve fault tolerance , 2001, SOSP.

[29]  Val Henson,et al.  The Zettabyte File System , 2003 .

[30]  Liuba Shrira,et al.  HQ replication: a hybrid quorum protocol for byzantine fault tolerance , 2006, OSDI '06.

[31]  Erik Riedel,et al.  More Than an Interface - SCSI vs. ATA , 2003, FAST.

[32]  GhemawatSanjay,et al.  The Google file system , 2003 .

[33]  Marvin Theimer,et al.  Dealing with server corruption in weakly consistent replicated data systems , 1999, Wirel. Networks.

[34]  Hari Balakrishnan,et al.  Resilient overlay networks , 2001, SOSP.