A decentralized algorithm for erasure-coded virtual disks

A federated array of bricks is a scalable distributed storage system composed from inexpensive storage bricks. It achieves high reliability with low cost by using erasure coding across the bricks to maintain data reliability in the face of brick failures. Erasure coding generates n encoded blocks from m data blocks (n > m) and permits the data blocks to be reconstructed from any m of these encoded blocks. We present a new fully decentralized erasure-coding algorithm for an asynchronous distributed system. Our algorithm provides fully linearizable read-write access to erasure-coded data and supports concurrent I/O controllers that may crash and recover. Our algorithm relies on a novel quorum construction where any two quorums intersect in m processes.

[1]  Michael K. Reiter,et al.  Efficient Byzantine-tolerant erasure-coded storage , 2004, International Conference on Dependable Systems and Networks, 2004.

[2]  David A. Patterson,et al.  Reducing the cost of system administration of a disk storage system built from commodity components , 2000 .

[3]  Nancy A. Lynch,et al.  Robust emulation of shared memory using dynamic quorum-acknowledged broadcasts , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[4]  Alex A. Shvartsmanz Rambo: A Reconfigurable Atomic Memory Service for Dynamic Networks , 2002 .

[5]  Julian Satran,et al.  Internet Small Computer Systems Interface (iSCSI) , 2004, RFC.

[6]  Garth A. Gibson,et al.  Highly concurrent shared storage , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[7]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[8]  Marcos K. Aguilera,et al.  Strict Linearizability and the Power of Aborting , 2003 .

[9]  J. Elson,et al.  Fine-grained network time synchronization using reference broadcasts , 2002, OSDI '02.

[10]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[11]  Arif Merchant,et al.  FAB: Enterprise Storage Systems on a Shoestring , 2003, HotOS.

[12]  James S. Plank,et al.  A tutorial on Reed–Solomon coding for fault‐tolerance in RAID‐like systems , 1997, Softw. Pract. Exp..

[13]  James S. Plank A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems , 1997 .

[14]  Ben Y. Zhao,et al.  Pond: The OceanStore Prototype , 2003, FAST.