Efficient Byzantine Fault-Tolerance

We present two asynchronous Byzantine fault-tolerant state machine replication (BFT) algorithms, which improve previous algorithms in terms of several metrics. First, they require only 2f+1 replicas, instead of the usual 3f+1. Second, the trusted service in which this reduction of replicas is based is quite simple, making a verified implementation straightforward (and even feasible using commercial trusted hardware). Third, in nice executions the two algorithms run in the minimum number of communication steps for nonspeculative and speculative algorithms, respectively, four and three steps. Besides the obvious benefits in terms of cost, resilience and management complexity-fewer replicas to tolerate a certain number of faults-our algorithms are simpler than previous ones, being closer to crash fault-tolerant replication algorithms. The performance evaluation shows that, even with the trusted component access overhead, they can have better throughput than Castro and Liskov's PBFT, and better latency in networks with nonnegligible communication delays.

[1]  Leslie Lamport,et al.  Reaching Agreement in the Presence of Faults , 1980, JACM.

[2]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[3]  Gabriel Bracha,et al.  An asynchronous [(n - 1)/3]-resilient consensus protocol , 1984, PODC '84.

[4]  David Powell,et al.  A fault- and intrusion- tolerant file system , 1985 .

[5]  Morrie Gasser,et al.  Building a Secure Computer System , 1988 .

[6]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[7]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[8]  Michael K. Reiter,et al.  The Rampart Toolkit for Building High-Integrity Services , 1994, Dagstuhl Seminar on Distributed Systems.

[9]  Matthew K. Franklin,et al.  The Ω key management service , 1996, CCS '96.

[10]  Hugo Krawczyk,et al.  HMAC: Keyed-Hashing for Message Authentication , 1997, RFC.

[11]  Miguel Correia,et al.  The Design of a COTS Real-Time Distributed Security Kernel (Extended Version) , 2001 .

[12]  David E. Culler,et al.  SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.

[13]  Michael Dahlin,et al.  Minimal Byzantine Storage , 2002, DISC.

[14]  Miguel Castro,et al.  Practical byzantine fault tolerance and proactive recovery , 2002, TOCS.

[15]  Jacob R. Lorch,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OSDI '02.

[16]  Tal Garfinkel,et al.  Terra: a virtual machine-based platform for trusted computing , 2003, SOSP '03.

[17]  Robbert van Renesse,et al.  COCA: a secure distributed online certification authority , 2002, Foundations of Intrusion Tolerant Systems, 2003 [Organically Assured and Survivable Information Systems].

[18]  Ravishankar K. Iyer,et al.  Transparent runtime randomization for security , 2003, 22nd International Symposium on Reliable Distributed Systems, 2003. Proceedings..

[19]  HarrisTim,et al.  Xen and the art of virtualization , 2003 .

[20]  Arun Venkataramani,et al.  Separating agreement from execution for byzantine fault tolerant services , 2003, SOSP '03.

[21]  Salim Hariri,et al.  Randomized Instruction Set Emulation To Disrupt Binary Code Injection Attacks , 2003 .

[22]  Christian Cachin,et al.  Secure distributed DNS , 2004, International Conference on Dependable Systems and Networks, 2004.

[23]  Miguel Correia,et al.  How to tolerate half less one Byzantine nodes in practical distributed systems , 2004, Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004..

[24]  Michael Dahlin,et al.  BAR fault tolerance for cooperative services , 2005, SOSP '05.

[25]  Paulo Veríssimo,et al.  Travelling through wormholes: a new look at distributed systems models , 2006, SIGA.

[26]  Srinivas Devadas,et al.  Virtual monotonic counters and count-limited objects using a TPM without a trusted OS , 2006, STC '06.

[27]  Calton Pu,et al.  Reducing TCB complexity for security-sensitive applications: three case studies , 2006, EuroSys.

[28]  Liuba Shrira,et al.  HQ replication: a hybrid quorum protocol for byzantine fault tolerance , 2006, OSDI '06.

[29]  Jean-Philippe Martin,et al.  Fast Byzantine Consensus , 2006, IEEE Transactions on Dependable and Secure Computing.

[30]  Miguel Correia,et al.  How Practical Are Intrusion-Tolerant Distributed Systems? , 2006 .

[31]  Stefan Berger,et al.  vTPM: Virtualizing the Trusted Platform Module , 2006, USENIX Security Symposium.

[32]  Rüdiger Kapitza,et al.  Hypervisor-Based Efficient Proactive Recovery , 2007, 2007 26th IEEE International Symposium on Reliable Distributed Systems (SRDS 2007).

[33]  Srinivas Devadas,et al.  Offline untrusted storage with immediate detection of forking and replay attacks , 2007, STC '07.

[34]  David Mazières,et al.  Beyond One-Third Faulty Replicas in Byzantine Fault Tolerant Systems , 2007, NSDI.

[35]  Keith Marzullo,et al.  Classic Paxos vs. fast Paxos: caveat emptor , 2007 .

[36]  Scott Shenker,et al.  Attested append-only memory: making adversaries stick to their word , 2007, SOSP.

[37]  Miguel Correia,et al.  DepSpace: a byzantine fault-tolerant coordination service , 2008, Eurosys '08.

[38]  Michael K. Reiter,et al.  How low can you go?: recommendations for hardware-supported minimal TCB code execution , 2008, ASPLOS.

[39]  Atul Singh,et al.  BFT Protocols Under Fire , 2008, NSDI.

[40]  Michael K. Reiter,et al.  Flicker: an execution infrastructure for tcb minimization , 2008, Eurosys '08.

[41]  Sangmin Lee,et al.  Upright cluster services , 2009, SOSP '09.

[42]  Ramakrishna Kotla,et al.  Zyzzyva: speculative byzantine fault tolerance , 2007, TOCS.

[43]  John Lane,et al.  Steward: Scaling Byzantine Fault-Tolerant Replication to Wide Area Networks , 2010, IEEE Transactions on Dependable and Secure Computing.

[44]  Alysson Neves Bessani,et al.  OS diversity for intrusion tolerance: Myth or reality? , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[45]  Rodrigo Rodrigues,et al.  Efficient middleware for byzantine fault tolerant database replication , 2011, EuroSys '11.