Efficient Byzantine-tolerant erasure-coded storage

This paper describes a decentralized consistency protocol for survivable storage that exploits local data versioning within each storage-node. Such versioning enables the protocol to efficiently provide linearizability and wait-freedom of read and write operations to erasure-coded data in asynchronous environments with Byzantine failures of clients and servers. By exploiting versioning storage-nodes, the protocol shifts most work to clients and allows highly optimistic operation: reads occur in a single round-trip unless clients observe concurrency or write failures. Measurements of a storage system prototype using this protocol show that it scales well with the number of failures tolerated, and its performance compares favorably with an efficient implementation of Byzantine-tolerant state machine replication.

[1]  Adi Shamir,et al.  How to share a secret , 1979, CACM.

[2]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[3]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[4]  Mahadev Satyanarayanan,et al.  Scale and performance in a distributed file system , 1987, SOSP '87.

[5]  Maurice Herlihy,et al.  How to Make Replicated Data Secure , 1987, CRYPTO.

[6]  Paul Feldman,et al.  A practical scheme for non-interactive verifiable secret sharing , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[7]  B. Clifford Neuman,et al.  Kerberos: An Authentication Service for Open Network Systems , 1988, USENIX Winter.

[8]  Philip M. Thambidurai,et al.  Interactive consistency with multiple failure modes , 1988, Proceedings [1988] Seventh Symposium on Reliable Distributed Systems.

[9]  Nancy A. Lynch,et al.  Consensus in the presence of partial synchrony , 1988, JACM.

[10]  Scale and performance in a distributed file system , 1988, TOCS.

[11]  Michael O. Rabin,et al.  Efficient dispersal of information for security, load balancing, and fault tolerance , 1989, JACM.

[12]  Li Gong Securely replicating authentication services , 1989, [1989] Proceedings. The 9th International Conference on Distributed Computing Systems.

[13]  Hector Garcia-Molina,et al.  Reliable scheduling in a TMR database system , 1989, TOCS.

[14]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[15]  Amr El Abbadi,et al.  Integrating Security with Fault-Tolerant Distributed Databases , 1990, Comput. J..

[16]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[17]  Mary Baker,et al.  Measurements of a distributed file system , 1991, SOSP '91.

[18]  Maurice Herlihy,et al.  Wait-free synchronization , 1991, TOPL.

[19]  Ronald L. Rivest,et al.  The MD5 Message-Digest Algorithm , 1992, RFC.

[20]  Paul D. Ezhilchelvan,et al.  Principal Features of the VOLTAN Family of Reliable Node Architectures for Distributed Systems , 1992, IEEE Trans. Computers.

[21]  Sam Toueg,et al.  Fault-tolerant wait-free shared objects , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[22]  Juan A. Garay,et al.  A Continuum of Failure Models for Distributed Computing , 1992, WDAG.

[23]  Hugo Krawczyk,et al.  Secret Sharing Made Short , 1994, CRYPTO.

[24]  Mahadev Satyanarayanan,et al.  An Empirical Study of a Highly Available File System , 1994, SIGMETRICS.

[25]  Ravi Mukkamala Storage Efficient and Secure Replicated Distribted Databases , 1994, IEEE Trans. Knowl. Data Eng..

[26]  Michael K. Reiter,et al.  How to securely replicate services , 1992, TOPL.

[27]  F. Cristian,et al.  ATOMIC BROADCAST: FROM SIMPLE MESSAGE DIFFUSION TO BYZANTINE AGREEMENT , 1995 .

[28]  Hugo Krawczyk,et al.  Keying Hash Functions for Message Authentication , 1996, CRYPTO.

[29]  Michael K. Reiter,et al.  Byzantine quorum systems , 1997, STOC '97.

[30]  Michael K. Reiter,et al.  Secure and scalable replication in Phalanx , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[31]  Sam Toueg,et al.  Fault-tolerant wait-free shared objects , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[32]  Miguel Oom Temudo de Castro,et al.  Practical Byzantine fault tolerance , 1999, OSDI '99.

[33]  Lorenzo Alvisi,et al.  Self-adjusting quorum systems for byzantine fault tolerance , 2000 .

[34]  Pradeep K. Khosla,et al.  Survivable Information Storage Systems , 2000, Computer.

[35]  The Load and Availability of Byzantine Quorum Systems , 2000, SIAM J. Comput..

[36]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[37]  Victor Shoup,et al.  Secure and Efficient Asynchronous Broadcast Protocols , 2001, CRYPTO.

[38]  Michael K. Reiter,et al.  Persistent objects in the Fleet system , 2001, Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX'01.

[39]  Miguel Castro,et al.  Byzantine fault tolerance can be fast , 2001, 2001 International Conference on Dependable Systems and Networks.

[40]  Louise E. Moser,et al.  The SecureRing group communication system , 2001, TSEC.

[41]  Michael Dahlin,et al.  Minimal Byzantine Storage , 2002, DISC.

[42]  Miguel Castro,et al.  Practical byzantine fault tolerance and proactive recovery , 2002, TOCS.

[43]  Craig A. N. Soules,et al.  Intrusion Detection, Diagnosis, and Recovery with Self-Securing Storage (CMU-CS-02-140) , 2002 .

[44]  Miguel Castro,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OPSR.

[45]  Gregory R. Ganger,et al.  Self-* Storage: Brick-based Storage with Automated Administration (CMU-CS-03-178) , 2003 .

[46]  Michael K. Reiter,et al.  Efficient Byzantine-tolerant Erasure-coded Storage (CMU-PDL-03-104) , 2003 .

[47]  Arif Merchant,et al.  FAB: Enterprise Storage Systems on a Shoestring , 2003, HotOS.

[48]  Craig A. N. Soules,et al.  Metadata Efficiency in Versioning File Systems , 2003, FAST.

[49]  Craig A. N. Soules,et al.  Self-securing storage: protecting data in compromised systems , 2000, Foundations of Intrusion Tolerant Systems, 2003 [Organically Assured and Survivable Information Systems].

[50]  Craig A. N. Soules,et al.  Storage-based Intrusion Detection: Watching Storage Activity for Suspicious Behavior , 2003, USENIX Security Symposium.

[51]  Arif Merchant,et al.  A decentralized algorithm for erasure-coded virtual disks , 2004, International Conference on Dependable Systems and Networks, 2004.

[52]  Michael K. Reiter,et al.  The Safety and Liveness Properties of a Protocol Family for Versatile Survivable Storage Infrastructures (CMU-PDL-03-105) , 2004 .