Practical Byzantine fault tolerance

This paper describes a new replication algorithm that is able to tolerate Byzantine faults. We believe that Byzantinefault-tolerant algorithms will be increasingly important in the future because malicious attacks and software errors are increasingly common and can cause faulty nodes to exhibit arbitrary behavior. Whereas previous algorithms assumed a synchronous system or were too slow to be used in practice, the algorithm described in this paper is practical: it works in asynchronous environments like the Internet and incorporates several important optimizations that improve the response time of previous algorithms by more than an order of magnitude. We implemented a Byzantine-fault-tolerant NFS service using our algorithm and measured its performance. The results show that our service is only 3% slower than a standard unreplicated NFS.

[1]  Stephen N. Zilles,et al.  Specification techniques for data abstractions , 1975 .

[2]  J. D. Day,et al.  A principle for resilient sharing of distributed resources , 1976, ICSE '76.

[3]  Adi Shamir,et al.  A method for obtaining digital signatures and public-key cryptosystems , 1978, CACM.

[4]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[5]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[6]  Adi Shamir,et al.  How to share a secret , 1979, CACM.

[7]  Leslie Lamport,et al.  Reaching Agreement in the Presence of Faults , 1980, JACM.

[8]  Jon Postel,et al.  User Datagram Protocol , 1980, RFC.

[9]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[10]  Richard D. Schlichting,et al.  Fail-stop processors: an approach to designing fault-tolerant computing systems , 1981, TOCS.

[11]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[12]  Songnian Zhou,et al.  The Berkeley Internet Name Domain Server , 1984 .

[13]  Sam Toueg,et al.  Asynchronous consensus and broadcast protocols , 1985, JACM.

[14]  D. Gawlick,et al.  Varieties of Concurrency Control in IMS/VS Fast Path. , 1985 .

[15]  Dan Walsh,et al.  Design and implementation of the Sun network filesystem , 1985, USENIX Conference Proceedings.

[16]  Jehan-François Pâris,et al.  Voting with Witnesses: A Constistency Scheme for Replicated Files , 1986, ICDCS.

[17]  Ralph C. Merkle,et al.  A Digital Signature Based on a Conventional Encryption Function , 1987, CRYPTO.

[18]  Mahadev Satyanarayanan,et al.  Scale and performance in a distributed file system , 1987, SOSP '87.

[19]  Maurice Herlihy,et al.  How to Make Replicated Data Secure , 1987, CRYPTO.

[20]  Maurice Herlihy,et al.  Axioms for concurrent objects , 1987, POPL '87.

[21]  Silvio Micali,et al.  A Digital Signature Scheme Secure Against Adaptive Chosen-Message Attacks , 1988, SIAM J. Comput..

[22]  Ivan Damgård,et al.  A Design Principle for Hash Functions , 1989, CRYPTO.

[23]  John K. Ousterhout,et al.  Why Aren't Operating Systems Getting Faster As Fast as Hardware? , 1990, USENIX Summer.

[24]  Mahadev Satyanarayanan,et al.  Scalable, secure, and highly available distributed file access , 1990, Computer.

[25]  Stephen E. Deering,et al.  Multicast routing in datagram internetworks and extended LANs , 1990, TOCS.

[26]  John S. Heidemann,et al.  Implementation of the Ficus Replicated File System , 1990, USENIX Summer.

[27]  Garret Swart,et al.  Granularity and semantic level of replication in the Echo distributed file system , 1990, [1990] Proceedings. Workshop on the Management of Replicated Data.

[28]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[29]  Michael Williams,et al.  Replication in the harp file system , 1991, SOSP '91.

[30]  André Schiper,et al.  Lightweight causal and atomic group multicast , 1991, TOCS.

[31]  Craig Partridge,et al.  Improving round-trip time estimates in reliable transport protocols , 1991, TOCS.

[32]  Rafail Ostrovsky,et al.  How to withstand mobile virus attacks (extended abstract) , 1991, PODC '91.

[33]  Li Gong,et al.  A security risk of depending on synchronized clocks , 1992, OPSR.

[34]  Ronald L. Rivest,et al.  The MD5 Message-Digest Algorithm , 1992, RFC.

[35]  Alfred C. Weaver,et al.  Xtp: The Xpress Transfer Protocol , 1992 .

[36]  Gene Tsudik Message authentication with one-way hash functions , 1992, CCRV.

[37]  Seif Haridi,et al.  Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[38]  Michael K. Reiter,et al.  The Rampart Toolkit for Building High-Integrity Services , 1994, Dagstuhl Seminar on Distributed Systems.

[39]  Michael K. Reiter,et al.  Secure agreement protocols: reliable and atomic group multicast in rampart , 1994, CCS '94.

[40]  Bart Preneel,et al.  MDx-MAC and Building Fast MACs from Hash Functions , 1995, CRYPTO.

[41]  P. Karn,et al.  Improving round-trip time estimates in reliable transport protocols , 1987, CCRV.

[42]  Hugo Krawczyk,et al.  Proactive Secret Sharing Or: How to Cope With Perpetual Leakage , 1995, CRYPTO.

[43]  Kyle Geiger,et al.  Inside ODBC , 1995 .

[44]  Flaviu Cristian,et al.  Atomic Broadcast: From Simple Message Diffusion to Byzantine Agreement , 1995, Inf. Comput..

[45]  Liming Chen,et al.  N-VERSION PROGRAMMINC: A FAULT-TOLERANCE APPROACH TO RELlABlLlTY OF SOFTWARE OPERATlON , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[46]  Mihir Bellare,et al.  Optimal Asymmetric Encryption-How to Encrypt with RSA , 1995 .

[47]  Michael K. Reiter,et al.  A high-throughput secure reliable multicast protocol , 1996, Proceedings 9th IEEE Computer Security Foundations Workshop.

[48]  Calton Pu,et al.  A Specialization Toolkit to Increase the Diversity of Operating Systems , 1996 .

[49]  Michael K. Reiter A Secure Group Membership Protocol , 1996, IEEE Trans. Software Eng..

[50]  Miguel Castro,et al.  Safe and efficient sharing of persistent objects in Thor , 1996, SIGMOD '96.

[51]  Mihir Bellare,et al.  The Exact Security of Digital Signatures - HOw to Sign with RSA and Rabin , 1996, EUROCRYPT.

[52]  Miguel Castro,et al.  HAC: hybrid adaptive caching for distributed storage systems , 1997, SOSP.

[53]  Markus Jakobsson,et al.  Proactive public key and signature systems , 1997, CCS '97.

[54]  Ran Canetti,et al.  Maintaining Authenticated Communication in the Presence of Break-Ins , 1997, PODC '97.

[55]  Jeffrey Katcher,et al.  PostMark: A New File System Benchmark , 1997 .

[56]  Steven McCanne,et al.  A reliable multicast framework for light-weight sessions and application level framing , 1995, SIGCOMM '95.

[57]  Michael K. Reiter,et al.  Unreliable intrusion detection in distributed computations , 1997, Proceedings 10th Computer Security Foundations Workshop.

[58]  Mihir Bellare,et al.  A New Paradigm for Collision-Free Hashing: Incrementality at Reduced Cost , 1997, EUROCRYPT.

[59]  David H. Ackley,et al.  Building diverse computer systems , 1997, Proceedings. The Sixth Workshop on Hot Topics in Operating Systems (Cat. No.97TB100133).

[60]  Michael K. Reiter,et al.  Byzantine quorum systems , 1997, STOC '97.

[61]  Michael K. Reiter,et al.  Secure and scalable replication in Phalanx , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[62]  Yoram Moses,et al.  Fully Polynomial Byzantine Agreement for n > 3t Processors in t + 1 Rounds , 1998, SIAM J. Comput..

[63]  Louise E. Moser,et al.  The SecureRing protocols for securing group communication , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.

[64]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[65]  Rachid Guerraoui,et al.  Muteness Failure Detectors: Specification and Implementation , 1999, EDCC.

[66]  Barbara Liskov,et al.  Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems , 1999, PODC '88.

[67]  Hugo Krawczyk,et al.  UMAC: Fast and Secure Message Authentication , 1999, CRYPTO.

[68]  David Mazières,et al.  Separating key management from file system security , 1999, SOSP.

[69]  Miguel Castro,et al.  Authenticated Byzantine Fault Tolerance Without Public-Key Cryptography , 1999 .

[70]  Miguel Castro,et al.  A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm , 1999 .

[71]  Michael K. Reiter,et al.  Fault detection for Byzantine quorum systems , 1999, Dependable Computing for Critical Applications 7.

[72]  Charles E. Spurgeon Ethernet: The Definitive Guide , 2000 .

[73]  Tal Rabin,et al.  Secure distributed storage and retrieval , 2000, Theor. Comput. Sci..

[74]  Brendan Murphy,et al.  Windows 2000 Dependability , 2000 .

[75]  Miguel Castro,et al.  Proactive recovery in a Byzantine-fault-tolerant system , 2000, OSDI.

[76]  Michael K. Reiter,et al.  An Architecture for Survivable Coordination in Large Distributed Systems , 2000, IEEE Trans. Knowl. Data Eng..

[77]  Michael K. Reiter,et al.  Dynamic byzantine quorum systems , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[78]  Radek Vingralek,et al.  How to build a trusted database system on untrusted storage , 2000, OSDI.

[79]  Ran Canetti,et al.  Maintaining Authenticated Communication in the Presence of Break-Ins , 2000, Journal of Cryptology.

[80]  Ueli Maurer,et al.  Advances in Cryptology — EUROCRYPT ’96 , 2001, Lecture Notes in Computer Science.