Minimal Byzantine Fault Tolerance: Algorithm and Evaluation

This paper presents two asynchronous Byzantine faulttolerant state machine replication (BFT) algorithms that are minimal in several senses. First, they require only 2 f + 1 replicas, instead of the usual 3 f + 1. Second, the trusted service in which this reduction of replicas is based is arguably minimal, so it is simple to verify and implement (which is possible even using commercial trusted hardware). Third, in nice executions the two algorithms run in the minimum number of communication steps for nonspeculative and speculative algorithms, respectively 4 and 3 steps. Besides the obvious benefits in terms of cost, resilience and management complexity of having less replicas to tolerate a certain number of faults, our algorithms are simpler than previous ones (being closer to crash faulttolerant replication algorithms). The performance evaluation shows that, even with the trusted component access overhead, they can have better throughput than Castro and Liskov’s PBFT, and better latency in networks with nonnegligible communication delays. Comparing with the previous paper DI-TR-08-29 [49], this version presents a slight modifications of the algorithms, the full proof of their correctness and a new performance evaluation.

[1]  Calton Pu,et al.  Reducing TCB complexity for security-sensitive applications: three case studies , 2006, EuroSys.

[2]  Miguel Correia,et al.  Minimal Byzantine Fault Tolerance , 2008 .

[3]  Miguel Castro,et al.  Practical byzantine fault tolerance and proactive recovery , 2002, TOCS.

[4]  Jean-Philippe Martin,et al.  Fast Byzantine Consensus , 2006, IEEE Transactions on Dependable and Secure Computing.

[5]  Tal Garfinkel,et al.  Terra: a virtual machine-based platform for trusted computing , 2003, SOSP '03.

[6]  Sean W. Smith,et al.  Experimenting with TCPA/TCG Hardware, Or: How I Learned to Stop Worrying and Love The Bear , 2003 .

[7]  Miguel Correia,et al.  How to tolerate half less one Byzantine nodes in practical distributed systems , 2004, Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004..

[8]  Miguel Correia,et al.  DepSpace: a byzantine fault-tolerant coordination service , 2008, Eurosys '08.

[9]  Srinivas Devadas,et al.  Offline untrusted storage with immediate detection of forking and replay attacks , 2007, STC '07.

[10]  Michael Dahlin,et al.  Minimal Byzantine Storage , 2002, DISC.

[11]  Michael Dahlin,et al.  BAR fault tolerance for cooperative services , 2005, SOSP '05.

[12]  Scott Shenker,et al.  Attested append-only memory: making adversaries stick to their word , 2007, SOSP.

[13]  Christian Cachin,et al.  Secure distributed DNS , 2004, International Conference on Dependable Systems and Networks, 2004.

[14]  Liuba Shrira,et al.  HQ replication: a hybrid quorum protocol for byzantine fault tolerance , 2006, OSDI '06.

[15]  David Powell,et al.  A fault- and intrusion- tolerant file system , 1985 .

[16]  Michael K. Reiter,et al.  The Rampart Toolkit for Building High-Integrity Services , 1994, Dagstuhl Seminar on Distributed Systems.

[17]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[18]  Matthew K. Franklin,et al.  The Ω key management service , 1996, CCS '96.

[19]  Michael K. Reiter,et al.  How low can you go?: recommendations for hardware-supported minimal TCB code execution , 2008, ASPLOS.

[20]  Gabriel Bracha,et al.  An asynchronous [(n - 1)/3]-resilient consensus protocol , 1984, PODC '84.

[21]  Ramakrishna Kotla,et al.  Zyzzyva: speculative byzantine fault tolerance , 2007, TOCS.

[22]  Sam Toueg,et al.  Randomized Byzantine Agreements , 1984, PODC '84.

[23]  Michael K. Reiter,et al.  Flicker: an execution infrastructure for tcb minimization , 2008, Eurosys '08.

[24]  Srinivas Devadas,et al.  Virtual monotonic counters and count-limited objects using a TPM without a trusted OS , 2006, STC '06.

[25]  Michael K. Reiter,et al.  Fault-scalable Byzantine fault-tolerant services , 2005, SOSP '05.

[26]  Atul Singh,et al.  BFT Protocols Under Fire , 2008, NSDI.

[27]  Bev Littlewood,et al.  Redundancy and Diversity in Security , 2004, ESORICS.

[28]  Arun Venkataramani,et al.  Separating agreement from execution for byzantine fault tolerant services , 2003, SOSP '03.

[29]  David Mazières,et al.  Beyond One-Third Faulty Replicas in Byzantine Fault Tolerant Systems , 2007, NSDI.

[30]  David Lie,et al.  Splitting interfaces: making trust between applications and operating systems configurable , 2006, OSDI '06.

[31]  Steven L. Kinney Trusted Platform Module Basics: Using TPM in Embedded Systems (Embedded Technology) , 2006 .

[32]  Leslie Lamport,et al.  Reaching Agreement in the Presence of Faults , 1980, JACM.

[33]  Paulo Veríssimo,et al.  Travelling through wormholes: a new look at distributed systems models , 2006, SIGA.

[34]  Miguel Correia,et al.  How Practical Are Intrusion-Tolerant Distributed Systems? , 2006 .

[35]  David E. Culler,et al.  SEDA: An Architecture for Scalable, Well-Conditioned Internet Services , 2001 .

[36]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[37]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[38]  Miguel Correia,et al.  The Design of a COTS Real-Time Distributed Security Kernel (Extended Version) , 2001 .

[39]  Morrie Gasser,et al.  Building a Secure Computer System , 1988 .

[40]  John Lane,et al.  Steward: Scaling Byzantine Fault-Tolerant Replication to Wide Area Networks , 2010, IEEE Transactions on Dependable and Secure Computing.

[41]  Keith Marzullo,et al.  Classic Paxos vs. fast Paxos: caveat emptor , 2007 .

[42]  Jacob R. Lorch,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OSDI '02.

[43]  Fred B. Schneider,et al.  COCA: a secure distributed online certification authority , 2002 .