CheapBFT: resource-efficient byzantine fault tolerance

One of the main reasons why Byzantine fault-tolerant (BFT) systems are not widely used lies in their high resource consumption: 3f+1 replicas are necessary to tolerate only f faults. Recent works have been able to reduce the minimum number of replicas to 2f+1 by relying on a trusted subsystem that prevents a replica from making conflicting statements to other replicas without being detected. Nevertheless, having been designed with the focus on fault handling, these systems still employ a majority of replicas during normal-case operation for seemingly redundant work. Furthermore, the trusted subsystems available trade off performance for security; that is, they either achieve high throughput or they come with a small trusted computing base. This paper presents CheapBFT, a BFT system that, for the first time, tolerates that all but one of the replicas active in normal-case operation become faulty. CheapBFT runs a composite agreement protocol and exploits passive replication to save resources; in the absence of faults, it requires that only f+1 replicas actively agree on client requests and execute them. In case of suspected faulty behavior, CheapBFT triggers a transition protocol that activates f extra passive replicas and brings all non-faulty replicas into a consistent state again. This approach, for example, allows the system to safely switch to another, more resilient agreement protocol. CheapBFT relies on an FPGA-based trusted subsystem for the authentication of protocol messages that provides high performance and comprises a small trusted computing base.

[1]  Fred B. Schneider,et al.  Implementing trustworthy services using replicated state machines , 2005, IEEE Security & Privacy Magazine.

[2]  Michael Dahlin,et al.  Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults , 2009, NSDI.

[3]  Arun Venkataramani,et al.  ZZ and the art of practical BFT execution , 2011, EuroSys '11.

[4]  Rüdiger Kapitza,et al.  Hypervisor-Based Efficient Proactive Recovery , 2007, 2007 26th IEEE International Symposium on Reliable Distributed Systems (SRDS 2007).

[5]  Heiko Stamer,et al.  A Software-Based Trusted Platform Module Emulator , 2008, TRUST.

[6]  Peter Y. K. Cheung,et al.  Fault tolerance and reliability in field-programmable gate arrays , 2010, IET Computers & Digital Techniques.

[7]  Tobias Distler,et al.  Increasing performance in byzantine fault-tolerant systems with on-demand replica consistency , 2011, EuroSys '11.

[8]  Miguel Correia,et al.  Efficient Byzantine Fault-Tolerance , 2013, IEEE Transactions on Computers.

[9]  Scott Shenker,et al.  Attested append-only memory: making adversaries stick to their word , 2007, SOSP.

[10]  Andreas Haeberlen,et al.  Practical accountability for distributed systems , 2007 .

[11]  Jacob R. Lorch,et al.  TrInc: Small Trusted Hardware for Large Distributed Systems , 2009, NSDI.

[12]  Arun Venkataramani,et al.  Separating agreement from execution for byzantine fault tolerant services , 2003, SOSP '03.

[13]  Marko Vukolic,et al.  The next 700 BFT protocols , 2010, EuroSys '10.

[14]  Ariel J. Feldman,et al.  SPORC: Group Collaboration using Untrusted Cloud Resources , 2010, OSDI.

[15]  Miguel Correia,et al.  Spin One's Wheels? Byzantine Fault Tolerance with a Spinning Primary , 2009, 2009 28th IEEE International Symposium on Reliable Distributed Systems.

[16]  Miguel Correia,et al.  How to tolerate half less one Byzantine nodes in practical distributed systems , 2004, Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004..

[17]  Ahmad-Reza Sadeghi,et al.  Reconfigurable trusted computing in hardware , 2007, STC '07.

[18]  Tobias Distler,et al.  SPARE: Replicas on Hold , 2011, NDSS.

[19]  Christian Cachin,et al.  Distributing trust on the Internet , 2001, 2001 International Conference on Dependable Systems and Networks.

[20]  Idit Keidar,et al.  Venus: verification for untrusted cloud storage , 2010, CCSW '10.

[21]  Peter Williams,et al.  The Blind Stone Tablet: Outsourcing Durability to Untrusted Parties , 2009, NDSS.

[22]  Paul England,et al.  Para-Virtualized TPM Sharing , 2008, TRUST.

[23]  Ramakrishna Kotla,et al.  Zyzzyva: speculative byzantine fault tolerance , 2007, TOCS.

[24]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[25]  Petr Kuznetsov,et al.  BFTW3: why? when? where? workshop on the theory and practice of byzantine fault tolerance , 2010, SIGA.

[26]  Miguel Castro,et al.  Practical byzantine fault tolerance and proactive recovery , 2002, TOCS.

[27]  Jehan-François Pâris,et al.  Voting with Witnesses: A Constistency Scheme for Replicated Files , 1986, ICDCS.

[28]  Leslie Lamport,et al.  Cheap Paxos , 2004, International Conference on Dependable Systems and Networks, 2004.

[29]  Edmund L. Wong,et al.  BFT: the time is now , 2008, LADIS '08.

[30]  Ramakrishna Kotla,et al.  High throughput Byzantine fault tolerance , 2004, International Conference on Dependable Systems and Networks, 2004.

[31]  Stefan Berger,et al.  vTPM: Virtualizing the Trusted Platform Module , 2006, USENIX Security Symposium.

[32]  Udo Steinberg,et al.  NOVA: a microhypervisor-based secure virtualization architecture , 2010, EuroSys '10.

[33]  Miguel Correia,et al.  EBAWA: Efficient Byzantine Agreement for Wide-Area Networks , 2010, 2010 IEEE 12th International Symposium on High Assurance Systems Engineering.

[34]  Andreas Haeberlen,et al.  PeerReview: practical accountability for distributed systems , 2007, SOSP.