Fault-scalable Byzantine fault-tolerant services

A fault-scalable service can be configured to tolerate increasing numbers of faults without significant decreases in performance. The Query/Update (Q/U) protocol is a new tool that enables construction of fault-scalable Byzantine fault-tolerant services. The optimistic quorum-based nature of the Q/U protocol allows it to provide better throughput and fault-scalability than replicated state machines using agreement-based protocols. A prototype service built using the Q/U protocol outperforms the same service built using a popular replicated state machine implementation at all system sizes in experiments that permit an optimistic execution. Moreover, the performance of the Q/U protocol decreases by only 36% as the number of Byzantine faults tolerated increases from one to five, whereas the performance of the replicated state machine decreases by 83%.

[1]  Michael K. Reiter,et al.  Lazy verification in fault-tolerant distributed storage systems , 2005, 24th IEEE Symposium on Reliable Distributed Systems (SRDS'05).

[2]  Michael K. Reiter,et al.  Correctness of the Read/Conditional-Write and Query/Update Protocols (CMU-PDL-05-107) , 2005 .

[3]  Jean-Philippe Martin,et al.  Fast Byzantine Consensus , 2006, IEEE Transactions on Dependable and Secure Computing.

[4]  Marc Najork,et al.  Boxwood: Abstractions as the Foundation for Storage Infrastructure , 2004, OSDI.

[5]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[6]  Michael K. Reiter,et al.  Nested objects in a Byzantine quorum-replicated system , 2004, Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004..

[7]  Dengguo Feng,et al.  Collisions for Hash Functions MD4, MD5, HAVAL-128 and RIPEMD , 2004, IACR Cryptol. ePrint Arch..

[8]  Gustavo Alonso,et al.  Are quorums an alternative for data replication? , 2003, TODS.

[9]  Maurice Herlihy,et al.  Obstruction-free synchronization: double-ended queues as an example , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[10]  Ben Y. Zhao,et al.  Pond: The OceanStore Prototype , 2003, FAST.

[11]  Ben Y. Zhao,et al.  Awarded Best Student Paper! - Pond: The OceanStore Prototype , 2003 .

[12]  Robbert van Renesse,et al.  COCA: a secure distributed online certification authority , 2002, Foundations of Intrusion Tolerant Systems, 2003 [Organically Assured and Survivable Information Systems].

[13]  Miguel Castro,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OPSR.

[14]  Miguel Castro,et al.  Practical byzantine fault tolerance and proactive recovery , 2002, TOCS.

[15]  Michael Dahlin,et al.  Minimal Byzantine Storage , 2002, DISC.

[16]  Klaus Kursawe,et al.  Optimistic Byzantine agreement , 2002, 21st IEEE Symposium on Reliable Distributed Systems, 2002. Proceedings..

[17]  Christian Cachin,et al.  Secure INtrusion-Tolerant Replication on the Internet , 2002, Proceedings International Conference on Dependable Systems and Networks.

[18]  Louise E. Moser,et al.  The SecureRing group communication system , 2001, TSEC.

[19]  Michael K. Reiter,et al.  Backoff protocols for distributed mutual exclusion and ordering , 2001, Proceedings 21st International Conference on Distributed Computing Systems.

[20]  David E. Culler,et al.  Scalable, distributed data structures for internet service construction , 2000, OSDI.

[21]  Witold Litwin,et al.  LH*RS: a high-availability scalable distributed data structure using Reed Solomon Codes , 2000, SIGMOD '00.

[22]  Michael K. Reiter,et al.  An Architecture for Survivable Coordination in Large Distributed Systems , 2000, IEEE Trans. Knowl. Data Eng..

[23]  Marcos K. Aguilera,et al.  Failure detection and consensus in the crash-recovery model , 1998, Distributed Computing.

[24]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[25]  Avishai Wool,et al.  Quorum Systems in Replicated Databases: Science or Fiction? , 1998, IEEE Data Eng. Bull..

[26]  Avishai Wool,et al.  The load and availability of Byzantine quorum systems , 1997, PODC '97.

[27]  Michael K. Reiter,et al.  Byzantine quorum systems , 1997, STOC '97.

[28]  Jeffrey Katcher,et al.  PostMark: A New File System Benchmark , 1997 .

[29]  Hugo Krawczyk,et al.  Keying Hash Functions for Message Authentication , 1996, CRYPTO.

[30]  Dennis Shasha,et al.  The dangers of replication and a solution , 1996, SIGMOD '96.

[31]  Moni Naor,et al.  The load, capacity and availability of quorum systems , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[32]  Michael K. Reiter,et al.  The Rampart Toolkit for Building High-Integrity Services , 1994, Dagstuhl Seminar on Distributed Systems.

[33]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[34]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[35]  Philip M. Thambidurai,et al.  Interactive consistency with multiple failure modes , 1988, Proceedings [1988] Seventh Symposium on Reliable Distributed Systems.

[36]  Sam Toueg,et al.  Asynchronous consensus and broadcast protocols , 1985, JACM.

[37]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[38]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[39]  Leslie Lamport,et al.  The Implementation of Reliable Distributed Multiprocess Systems , 1978, Comput. Networks.