hBFT: Speculative Byzantine Fault Tolerance with Minimum Cost

We present hBFT, a hybrid, Byzantine fault-tolerant, replicated state machine protocol with optimal resilience. Under normal circumstances, hBFT uses speculation, i.e., replicas directly adopt the order from the primary and send replies to the clients. As in prior work such as Zyzzyva, when replicas are out of order, clients can detect the inconsistency and help replicas converge on the total ordering. However, we take a different approach than previous work that has four distinct benefits: it requires many fewer cryptographic operations, it moves critical jobs to the clients with no additional costs, faulty clients can be detected and identified, and performance in the presence of client participation will not degrade as long as the primary is correct. The correctness is guaranteed by a three-phase checkpoint subprotocol similar to PBFT, which is tailored to our needs. The protocol is triggered by the primary when a certain number of requests are executed or by clients when they detect an inconsistency.

[1]  Michael Dahlin,et al.  Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults , 2009, NSDI.

[2]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[3]  Scott Shenker,et al.  Attested append-only memory: making adversaries stick to their word , 2007, SOSP.

[4]  Hugo Krawczyk,et al.  Keying Hash Functions for Message Authentication , 1996, CRYPTO.

[5]  Marko Vukolic,et al.  The Next 700 BFT Protocols , 2015, ACM Trans. Comput. Syst..

[6]  Mihir Bellare,et al.  The Exact Security of Digital Signatures - HOw to Sign with RSA and Rabin , 1996, EUROCRYPT.

[7]  Ramakrishna Kotla,et al.  Zyzzyva , 2007, SOSP.

[8]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[9]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[10]  John Lane,et al.  Byzantine replication under attack , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[11]  Miguel Castro,et al.  Practical byzantine fault tolerance and proactive recovery , 2002, TOCS.

[12]  Jean-Philippe Martin,et al.  Fast Byzantine Consensus , 2006, IEEE Transactions on Dependable and Secure Computing.

[13]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[14]  Piotr Zielinski,et al.  Optimistically Terminating Consensus: All Asynchronous Consensus Protocols in One Framework , 2006, 2006 Fifth International Symposium on Parallel and Distributed Computing.

[15]  Michael K. Reiter,et al.  Byzantine quorum systems , 1997, STOC '97.

[16]  Mihir Bellare,et al.  New Proofs for NMAC and HMAC: Security without Collision Resistance , 2006, Journal of Cryptology.

[17]  Nancy G. Leveson,et al.  An experimental evaluation of the assumption of independence in multiversion programming , 1986, IEEE Transactions on Software Engineering.

[18]  Andreas Haeberlen,et al.  The Case for Byzantine Fault Detection , 2006, HotDep.

[19]  Rachid Guerraoui,et al.  Encapsulating Failure Detection: From Crash to Byzantine Failures , 2002, Ada-Europe.

[20]  Michael K. Reiter,et al.  Fault-scalable Byzantine fault-tolerant services , 2005, SOSP '05.

[21]  Piotr Zielinski Low-latency atomic broadcast in the presence of contention , 2007, Distributed Computing.

[22]  K. Marzullo,et al.  Towards Low Latency State Machine Replication for Uncivil Wide-area Networks , 2009 .

[23]  Jonathan Kirsch,et al.  Scaling Byzantine Fault-Tolerant Replication toWide Area Networks , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[24]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[25]  Miguel Castro,et al.  Using abstraction to improve fault tolerance , 2001, Proceedings Eighth Workshop on Hot Topics in Operating Systems.

[26]  Miguel Castro,et al.  BASE: using abstraction to improve fault tolerance , 2001, SOSP.

[27]  Leslie Lamport,et al.  Fast Paxos , 2006, Distributed Computing.

[28]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1992, PODC '92.

[29]  Liuba Shrira,et al.  HQ replication: a hybrid quorum protocol for byzantine fault tolerance , 2006, OSDI '06.

[30]  Ramakrishna Kotla,et al.  Zyzzyva: speculative byzantine fault tolerance , 2007, TOCS.

[31]  Miguel Correia,et al.  Spin One's Wheels? Byzantine Fault Tolerance with a Spinning Primary , 2009, 2009 28th IEEE International Symposium on Reliable Distributed Systems.

[32]  Leslie Lamport,et al.  Lower bounds for asynchronous consensus , 2006, Distributed Computing.

[33]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[34]  Michael K. Reiter,et al.  Zzyzx: Scalable fault tolerance through Byzantine locking , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[35]  Sam Toueg,et al.  Unreliable failure detectors for asynchronous systems (preliminary version) , 1991, PODC '91.

[36]  Arun Venkataramani,et al.  Separating agreement from execution for byzantine fault tolerant services , 2003, SOSP '03.

[37]  Mike Hibler,et al.  An integrated experimental environment for distributed systems and networks , 2002, OPSR.

[38]  Michel Raynal,et al.  A simple and fast asynchronous consensus protocol based on a weak failure detector , 1999, Distributed Computing.