Proactive recovery in a Byzantine-fault-tolerant system

This paper describes an asynchronous state-machine replication system that tolerates Byzantine faults, which can be caused by malicious attacks or software errors. Our system is the first to recover Byzantine-faulty replicas proactively and it performs well because it uses symmetric rather than public-key cryptography for authentication. The recovery mechanism allows us to tolerate any number of faults over the lifetime of the system provided fewer than 1/3 of the replicas become faulty within a window of vulnerability that is small under normal conditions. The window may increase under a denial-of-service attack but we can detect and respond to such attacks. The paper presents results of experiments showing that overall performance is good and that even a small window of vulnerability has little impact on service latency.

[1]  Mihir Bellare,et al.  Optimal Asymmetric Encryption-How to Encrypt with RSA , 1995 .

[2]  Hugh C. Williams,et al.  A modification of the RSA public-key encryption procedure (Corresp.) , 1980, IEEE Trans. Inf. Theory.

[3]  Tal Rabin,et al.  Secure distributed storage and retrieval , 2000, Theor. Comput. Sci..

[4]  Michael K. Reiter,et al.  A high-throughput secure reliable multicast protocol , 1996, Proceedings 9th IEEE Computer Security Foundations Workshop.

[5]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[6]  Michael K. Reiter,et al.  A high-throughput secure reliable multicast protocol , 1996, Proceedings 9th IEEE Computer Security Foundations Workshop.

[7]  Maurice Herlihy,et al.  Axioms for concurrent objects , 1987, POPL '87.

[8]  David Mazières,et al.  Separating key management from file system security , 1999, SOSP.

[9]  Michael K. Reiter A Secure Group Membership Protocol , 1996, IEEE Trans. Software Eng..

[10]  Michael K. Reiter,et al.  Secure and scalable replication in Phalanx , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[11]  Mihir Bellare,et al.  The Exact Security of Digital Signatures - HOw to Sign with RSA and Rabin , 1996, EUROCRYPT.

[12]  ZHANGLi-xia,et al.  A reliable multicast framework for light-weight sessions and application level framing , 1995 .

[13]  Hugo Krawczyk,et al.  UMAC: Fast and Secure Message Authentication , 1999, CRYPTO.

[14]  David H. Ackley,et al.  Building diverse computer systems , 1997, Proceedings. The Sixth Workshop on Hot Topics in Operating Systems (Cat. No.97TB100133).

[15]  Eli Biham,et al.  TIGER: A Fast New Hash Function , 1996, FSE.

[16]  Michael Williams,et al.  Replication in the harp file system , 1991, SOSP '91.

[17]  Mahadev Satyanarayanan,et al.  Scale and performance in a distributed file system , 1987, SOSP '87.

[18]  Rafail Ostrovsky,et al.  How To Withstand Mobile Virus Attacks , 1991, PODC 1991.

[19]  Ran Canetti,et al.  Maintaining Authenticated Communication in the Presence of Break-Ins , 1997, PODC '97.

[20]  Steven McCanne,et al.  A reliable multicast framework for light-weight sessions and application level framing , 1995, SIGCOMM '95.

[21]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[22]  Miguel Castro,et al.  A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm , 1999 .

[23]  Yoram Moses,et al.  Fully Polynomial Byzantine Agreement for n > 3t Processors in t + 1 Rounds , 1998, SIAM J. Comput..

[24]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[25]  Sam Toueg,et al.  Asynchronous consensus and broadcast protocols , 1985, JACM.

[26]  Louise E. Moser,et al.  The SecureRing protocols for securing group communication , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.

[27]  Dan Walsh,et al.  Design and implementation of the Sun network filesystem , 1985, USENIX Conference Proceedings.

[28]  Mihir Bellare,et al.  A New Paradigm for Collision-Free Hashing: Incrementality at Reduced Cost , 1997, EUROCRYPT.

[29]  Christian S. Collberg,et al.  Watermarking, Tamper-Proofing, and Obfuscation-Tools for Software Protection , 2002, IEEE Trans. Software Eng..

[30]  Gene Tsudik Message authentication with one-way hash functions , 1992, CCRV.

[31]  Li Gong,et al.  A security risk of depending on synchronized clocks , 1992, OPSR.

[32]  Hugo Krawczyk,et al.  Proactive Secret Sharing Or: How to Cope With Perpetual Leakage , 1995, CRYPTO.

[33]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[34]  Barbara Liskov,et al.  Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems , 1999, PODC '88.

[35]  Michael K. Reiter,et al.  The Rampart Toolkit for Building High-Integrity Services , 1994, Dagstuhl Seminar on Distributed Systems.

[36]  Ronald L. Rivest,et al.  The MD5 Message-Digest Algorithm , 1992, RFC.

[37]  Rafail Ostrovsky,et al.  How to withstand mobile virus attacks (extended abstract) , 1991, PODC '91.

[38]  André Schiper,et al.  Lightweight causal and atomic group multicast , 1991, TOCS.

[39]  John K. Ousterhout,et al.  Why Aren't Operating Systems Getting Faster As Fast as Hardware? , 1990, USENIX Summer.

[40]  Markus Jakobsson,et al.  Proactive public key and signature systems , 1997, CCS '97.