A coded shared atomic memory algorithm for message passing architectures

This paper considers the communication and storage costs of emulating atomic (linearizable) multi-writer multi-reader shared memory in distributed message-passing systems. The paper contains three main contributions: (1) we present an atomic shared-memory emulation algorithm that we call Coded Atomic Storage (CAS). This algorithm uses erasure coding methods. In a storage system with N servers that is resilient to f server failures, we show that the communication cost of CAS is $$\frac{N}{N-2f}$$NN-2f. The storage cost of CAS is unbounded. (2) We present a modification of the CAS algorithm known as CAS with garbage collection (CASGC). The CASGC algorithm is parameterized by an integer $$\delta $$δ and has a bounded storage cost. We show that the CASGC algorithm satisfies atomicity. In every execution of CASGC where the number of server failures is no bigger than f, we show that every write operation invoked at a non-failing client terminates. We also show that in an execution of CASGC with parameter $$\delta $$δ where the number of server failures is no bigger than f,  a read operation terminates provided that the number of write operations that are concurrent with the read is no bigger than $$\delta $$δ. We explicitly characterize the storage cost of CASGC, and show that it has the same communication cost as CAS. (3) We describe an algorithm known as the Communication Cost Optimal Atomic Storage (CCOAS) algorithm that achieves a smaller communication cost than CAS and CASGC. In particular, CCOAS incurs read and write communication costs of $$\frac{N}{N-f}$$NN-f measured in terms of number of object values. We also discuss drawbacks of CCOAS as compared with CAS and CASGC.

[1]  Arif Merchant,et al.  A decentralized algorithm for erasure-coded virtual disks , 2004, International Conference on Dependable Systems and Networks, 2004.

[2]  I. Reed,et al.  Polynomial Codes Over Certain Finite Fields , 1960 .

[3]  Ron M. Roth,et al.  Introduction to Coding Theory , 2019, Discrete Mathematics.

[4]  Marcos K. Aguilera,et al.  Using erasure codes efficiently for storage in a distributed system , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[5]  Marko Vukolic,et al.  Quorum Systems: With Applications to Storage and Consensus , 2012, Synthesis Lectures on Distributed Computing Theory.

[6]  Seif Haridi,et al.  Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[7]  Nancy A. Lynch,et al.  A Coded Shared Atomic Memory Algorithm for Message Passing Architectures , 2014, NCA.

[8]  Marcos K. Aguilera,et al.  Dynamic atomic storage without consensus , 2009, PODC '09.

[9]  Nancy A. Lynch,et al.  Rambo: a robust, reconfigurable atomic memory service for dynamic networks , 2010, Distributed Computing.

[10]  Rachid Guerraoui,et al.  Optimistic Erasure-Coded Distributed Storage , 2008, DISC.

[11]  Yuan Zhou Introduction to Coding Theory , 2010 .

[12]  Ghassan O. Karame,et al.  PoWerStore: proofs of writing for efficient and robust storage , 2012, CCS.

[13]  Shu Lin,et al.  Error Control Coding , 2004 .

[14]  Arif Merchant,et al.  FAB: building distributed enterprise disk arrays from commodity components , 2004, ASPLOS XI.

[15]  Hagit Attiya,et al.  Sharing memory robustly in message-passing systems , 1990, PODC '90.

[16]  Nancy A. Lynch,et al.  Efficient Replication of Large Data Objects , 2003, DISC.

[17]  Zhiying Wang,et al.  On multi-version coding for distributed storage , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[18]  Zhiying Wang,et al.  Multi-version coding in distributed storage , 2014, 2014 IEEE International Symposium on Information Theory.

[19]  Michael K. Reiter,et al.  Efficient Byzantine-tolerant erasure-coded storage , 2004, International Conference on Dependable Systems and Networks, 2004.

[20]  Nancy A. Lynch,et al.  Coded Emulation of Shared Atomic Memory for Message Passing Architectures , 2013 .

[21]  Nancy A. Lynch,et al.  Robust emulation of shared memory using dynamic quorum-acknowledged broadcasts , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[22]  Robert H. Thomas,et al.  A Majority consensus approach to concurrency control for multiple copy databases , 1979, ACM Trans. Database Syst..

[23]  Dan Dobre,et al.  Erasure-Coded Byzantine Storage with Separate Metadata , 2014, OPODIS.

[24]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[25]  Stefano Tessaro,et al.  Optimal Resilience for Erasure-Coded Byzantine Distributed Storage , 2005, International Conference on Dependable Systems and Networks (DSN'06).

[26]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[27]  Michael K. Reiter,et al.  Low-overhead byzantine fault-tolerant storage , 2007, SOSP.

[28]  Leslie Lamport,et al.  On Interprocess Communication-Part I: Basic Formalism, Part II: Algorithms , 2016 .

[29]  Nancy A. Lynch,et al.  Specifying and using a partitionable group communication service , 2001, TOCS.

[30]  Stefano Tessaro,et al.  Asynchronous verifiable information dispersal , 2005, 24th IEEE Symposium on Reliable Distributed Systems (SRDS'05).

[31]  Michael Dahlin,et al.  Minimal Byzantine Storage , 2002, DISC.

[32]  Nancy A. Lynch,et al.  An introduction to input/output automata , 1989 .

[33]  Gagan Agrawal,et al.  Coding-Based Replication Schemes for Distributed Systems , 1995, IEEE Trans. Parallel Distributed Syst..

[34]  Leslie Lamport,et al.  Interprocess Communication , 2020, Practical System Programming with C.

[35]  Yuval Cassuto,et al.  What can coding theory do for storage systems? , 2013, SIGA.

[36]  Xiaozhou Li,et al.  Efficient eventual consistency in Pahoehoe, an erasure-coded key-blob archive , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[37]  Michael K. Reiter,et al.  Byzantine quorum systems , 1997, STOC '97.

[38]  Frédérique E. Oggier,et al.  An overview of codes tailor-made for better repairability in networked distributed storage systems , 2013, SIGA.

[39]  Daniel J. Costello,et al.  Error Control Coding, Second Edition , 2004 .

[40]  Michael K. Reiter,et al.  Fault-scalable Byzantine fault-tolerant services , 2005, SOSP '05.