On the Practicality of Practical Byzantine Fault Tolerance

Byzantine Fault Tolerant (BFT) systems are considered to be state of the art with regards to providing reliability in distributed systems. Despite over a decade of research, however, BFT systems are rarely used in practice. In this paper, we describe our experience, from an application developer's perspective, trying to leverage the publicly available, highly- studied and extended "PBFT" middleware (by Castro and Liskov), to provide provable reliability guarantees for an electronic voting application with high security and robustness needs. We describe several obstacles we encountered and drawbacks we identified in the PBFT approach. These include some that we tackled, such as lack of support for dynamic client management and leaving state management completely up to the application. Others still remaining include the lack of robust handling of non-determinism, lack of support for web-based applications, lack of support for stronger cryptographic primitives, and more. We find that, while many of the obstacles could be overcome, they require significant engineering effort and time and their performance implications for the end-application are unclear. An application developer is thus unlikely to be willing to invest the time and effort to do so to leverage the BFT approach.

[1]  John Lane,et al.  Byzantine replication under attack , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[2]  Jonathan Kirsch,et al.  Steward: Scaling Byzantine Fault-Tolerant Systems to Wide Area Networks , 2005 .

[3]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[4]  Tobias Distler,et al.  SPARE: Replicas on Hold , 2011, NDSS.

[5]  Rodrigo Rodrigues,et al.  Efficient middleware for byzantine fault tolerant database replication , 2011, EuroSys '11.

[6]  Yvo Desmedt,et al.  Threshold Cryptosystems , 1989, CRYPTO.

[7]  Seif Haridi,et al.  Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[8]  P. Shenoy,et al.  ZZ and the Art of Practical BFT , 2009 .

[9]  鈴木 昭二,et al.  Reliable Distributed Systems , 1998 .

[10]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[11]  Jonathan Kirsch,et al.  Scaling Byzantine Fault-Tolerant Replication toWide Area Networks , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[12]  Tobias Distler,et al.  Increasing performance in byzantine fault-tolerant systems with on-demand replica consistency , 2011, EuroSys '11.

[13]  Ralph C. Merkle,et al.  A Digital Signature Based on a Conventional Encryption Function , 1987, CRYPTO.

[14]  Michael K. Reiter,et al.  Fault-scalable Byzantine fault-tolerant services , 2005, SOSP '05.

[15]  Sangmin Lee,et al.  Upright cluster services , 2009, SOSP '09.

[16]  Hari Balakrishnan,et al.  Tolerating byzantine faults in transaction processing systems using commit barrier scheduling , 2007, SOSP.

[17]  Michael J. Freedman,et al.  Prophecy: Using History for High-Throughput Fault Tolerance , 2010, NSDI.

[18]  Aggelos Kiayias,et al.  An Internet Voting System Supporting User Privacy , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[19]  Atul Singh,et al.  BFT Protocols Under Fire , 2008, NSDI.

[20]  Arun Venkataramani,et al.  Separating agreement from execution for byzantine fault tolerant services , 2003, SOSP '03.

[21]  Ramakrishna Kotla,et al.  High throughput Byzantine fault tolerance , 2004, International Conference on Dependable Systems and Networks, 2004.

[22]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[23]  Michael Dahlin,et al.  Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults , 2009, NSDI.

[24]  Arun Venkataramani,et al.  ZZ and the art of practical BFT execution , 2011, EuroSys '11.

[25]  Ramakrishna Kotla,et al.  Zyzzyva , 2007, SOSP.

[26]  Jochen Liedtke,et al.  On micro-kernel construction , 1995, SOSP.

[27]  Priya Narasimhan,et al.  Thema: Byzantine-fault-tolerant middleware for Web-service applications , 2005, 24th IEEE Symposium on Reliable Distributed Systems (SRDS'05).

[28]  Leslie Lamport,et al.  The Implementation of Reliable Distributed Multiprocess Systems , 1978, Comput. Networks.

[29]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[30]  Miguel Castro,et al.  Using abstraction to improve fault tolerance , 2001, Proceedings Eighth Workshop on Hot Topics in Operating Systems.

[31]  Miguel Castro,et al.  BASE: using abstraction to improve fault tolerance , 2001, SOSP.

[32]  Kenneth J. Goldman,et al.  Byzantine Fault-Tolerant Web Services for n-Tier and Service Oriented Architectures , 2008, 2008 The 28th International Conference on Distributed Computing Systems.

[33]  Liuba Shrira,et al.  HQ replication: a hybrid quorum protocol for byzantine fault tolerance , 2006, OSDI '06.