Hybrids on Steroids: SGX-Based High Performance BFT

With the advent of trusted execution environments provided by recent general purpose processors, a class of replication protocols has become more attractive than ever: Protocols based on a hybrid fault model are able to tolerate arbitrary faults yet reduce the costs significantly compared to their traditional Byzantine relatives by employing a small subsystem trusted to only fail by crashing. Unfortunately, existing proposals have their own price: We are not aware of any hybrid protocol that is backed by a comprehensive formal specification, complicating the reasoning about correctness and implications. Moreover, current protocols of that class have to be performed largely sequentially. Hence, they are not well-prepared for just the modern multi-core processors that bring their very own fault model to a broad audience. In this paper, we present Hybster, a new hybrid state-machine replication protocol that is highly parallelizable and specified formally. With over 1 million operations per second using only four cores, the evaluation of our Intel SGX-based prototype implementation shows that Hybster makes hybrid state-machine replication a viable option even for today's very demanding critical services.

[1]  Miguel Castro,et al.  Using abstraction to improve fault tolerance , 2001, Proceedings Eighth Workshop on Hot Topics in Operating Systems.

[2]  John Lane,et al.  Byzantine replication under attack , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[3]  Miguel Castro,et al.  A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm , 1999 .

[4]  Miguel Castro,et al.  BASE: using abstraction to improve fault tolerance , 2001, SOSP.

[5]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[6]  Liuba Shrira,et al.  HQ replication: a hybrid quorum protocol for byzantine fault tolerance , 2006, OSDI '06.

[7]  Johannes Behl,et al.  Consensus-Oriented Parallelization: How to Earn Your First Million , 2015, Middleware.

[8]  Marko Vukolic,et al.  The Quest for Scalable Blockchain Fabric: Proof-of-Work vs. BFT Replication , 2015, iNetSeC.

[9]  Miguel Correia,et al.  Worm-IT - A wormhole-based intrusion-tolerant group communication system , 2007, J. Syst. Softw..

[10]  Rüdiger Kapitza,et al.  Hypervisor-Based Efficient Proactive Recovery , 2007, 2007 26th IEEE International Symposium on Reliable Distributed Systems (SRDS 2007).

[11]  Miguel Correia,et al.  Spin One's Wheels? Byzantine Fault Tolerance with a Spinning Primary , 2009, 2009 28th IEEE International Symposium on Reliable Distributed Systems.

[12]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[13]  Leslie Lamport,et al.  Reaching Agreement in the Presence of Faults , 1980, JACM.

[14]  Yang Wang,et al.  All about Eve: Execute-Verify Replication for Multi-Core Servers , 2012, OSDI.

[15]  Michael Dahlin,et al.  Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults , 2009, NSDI.

[16]  Miguel Correia,et al.  How to tolerate half less one Byzantine nodes in practical distributed systems , 2004, Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004..

[17]  Arun Venkataramani,et al.  ZZ and the art of practical BFT execution , 2011, EuroSys '11.

[18]  R. Kapitza,et al.  Hybster - A Highly Parallelizable Protocol for Hybrid Fault-Tolerant Service Replication , 2017 .

[19]  Ramakrishna Kotla,et al.  High throughput Byzantine fault tolerance , 2004, International Conference on Dependable Systems and Networks, 2004.

[20]  Tobias Distler,et al.  Resource-Efficient Byzantine Fault Tolerance , 2016, IEEE Transactions on Computers.

[21]  John M. Rushby,et al.  Design and verification of secure systems , 1981, SOSP.

[22]  Robbert van Renesse,et al.  Byzantine Chain Replication , 2012, OPODIS.

[23]  Ramakrishna Kotla,et al.  Zyzzyva , 2007, SOSP.

[24]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[25]  Dong Zhou,et al.  Rex: replication at the speed of multi-core , 2014, EuroSys '14.

[26]  Michael K. Reiter,et al.  Fault-scalable Byzantine fault-tolerant services , 2005, SOSP '05.

[27]  Alysson Neves Bessani,et al.  State Machine Replication for the Masses with BFT-SMART , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[28]  Jacob R. Lorch,et al.  TrInc: Small Trusted Hardware for Large Distributed Systems , 2009, NSDI.

[29]  Arun Venkataramani,et al.  Separating agreement from execution for byzantine fault tolerant services , 2003, SOSP '03.

[30]  Elaine Shi,et al.  The Honey Badger of BFT Protocols , 2016, CCS.

[31]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[32]  Miguel Correia,et al.  EBAWA: Efficient Byzantine Agreement for Wide-Area Networks , 2010, 2010 IEEE 12th International Symposium on High Assurance Systems Engineering.

[33]  Carlos V. Rozas,et al.  Innovative instructions and software model for isolated execution , 2013, HASP '13.

[34]  Michael K. Reiter,et al.  Zzyzx: Scalable fault tolerance through Byzantine locking , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[35]  Johannes Behl,et al.  CheapBFT: resource-efficient byzantine fault tolerance , 2012, EuroSys '12.

[36]  Fernando Pedone,et al.  Rethinking State-Machine Replication for Parallelism , 2013, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[37]  Marko Vukolic,et al.  The next 700 BFT protocols , 2010, EuroSys '10.

[38]  Vivien Quéma,et al.  RBFT: Redundant Byzantine Fault Tolerance , 2013, 2013 IEEE 33rd International Conference on Distributed Computing Systems.

[39]  Tobias Distler,et al.  SPARE: Replicas on Hold , 2011, NDSS.

[40]  André Schiper,et al.  Achieving High-Throughput State Machine Replication in Multi-core Systems , 2013, 2013 IEEE 33rd International Conference on Distributed Computing Systems.

[41]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[42]  Alysson Neves Bessani,et al.  From Byzantine Consensus to BFT State Machine Replication: A Latency-Optimal Transformation , 2012, 2012 Ninth European Dependable Computing Conference.

[43]  Scott Shenker,et al.  Attested append-only memory: making adversaries stick to their word , 2007, SOSP.

[44]  Ramakrishna Kotla,et al.  Zyzzyva: speculative byzantine fault tolerance , 2007, TOCS.

[45]  Marko Vukolic,et al.  The Next 700 BFT Protocols , 2015, ACM Trans. Comput. Syst..

[46]  Tobias Distler,et al.  Increasing performance in byzantine fault-tolerant systems with on-demand replica consistency , 2011, EuroSys '11.

[47]  Miguel Correia,et al.  Efficient Byzantine Fault-Tolerance , 2013, IEEE Transactions on Computers.