论文信息 - Multi-Version Coding—An Information-Theoretic Perspective of Consistent Distributed Storage

Multi-Version Coding—An Information-Theoretic Perspective of Consistent Distributed Storage

In applications of distributed storage systems to distributed computing and implementation of key-value stores, the following property, usually referred to as consistency in distributed computing, is an important requirement: as the data stored changes, the latest version of the data must be accessible to a client that connects to the storage system. Motivated by technological trends where key-value stores are increasingly implemented in high-speed memory, an information theoretic formulation called multi-version coding is introduced in this paper in order to understand and minimize the memory overhead of consistent distributed storage. Multi-version coding is characterized by <inline-formula> <tex-math notation="LaTeX">$\nu$ </tex-math></inline-formula> totally ordered versions of a message and a storage system with <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula> servers. At each server, values corresponding to an arbitrary subset of the <inline-formula> <tex-math notation="LaTeX">$\nu$ </tex-math></inline-formula> versions are received and encoded. For any subset of <inline-formula> <tex-math notation="LaTeX">$c$ </tex-math></inline-formula> servers in the storage system, the value corresponding to the latest common version or a later version, as per the total ordering, among the <inline-formula> <tex-math notation="LaTeX">$c$ </tex-math></inline-formula> servers is required to be decodable. An achievable multi-version code construction via linear coding and a converse result that shows that the construction is asymptotically tight when <inline-formula> <tex-math notation="LaTeX">$\nu |(c-1)$ </tex-math></inline-formula> are provided. An implication of the converse is that there is an inevitable price, in terms of storage cost, to ensure consistency in distributed storage systems.

Zhiying Wang | Viveck R. Cadambe | Zhiying Wang | V. Cadambe

[1] Randall R. Stewart,et al. Stream Control Transmission Protocol , 2000, RFC.

[2] Maurice Herlihy,et al. The Art of Multiprocessor Programming, Revised Reprint , 2012 .

[3] Kannan Ramchandran,et al. EC-Cache: Load-Balanced, Low-Latency Cluster Caching with Online Erasure Coding , 2016, OSDI.

[4] Hagit Attiya,et al. Sharing memory robustly in message-passing systems , 1990, PODC '90.

[5] Leslie Lamport,et al. The part-time parliament , 1998, TOCS.

[6] Nancy A. Lynch,et al. A Layered Architecture for Erasure-Coded Consistent Distributed Storage , 2017, PODC.

[7] Luiz André Barroso,et al. The tail at scale , 2013, CACM.

[8] Christina Fragouli,et al. On Pliable Index Coding , 2019, ArXiv.

[9] Cheng Huang,et al. Giza: Erasure Coding Objects across Global Data Centers , 2017, USENIX Annual Technical Conference.

[10] Carlos Guestrin,et al. Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[11] Christopher Frost,et al. Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[12] Christina Fragouli,et al. Content-type coding , 2015, 2015 International Symposium on Network Coding (NetCod).

[13] Vijay K. Garg,et al. Fault tolerance in distributed systems using fused state machines , 2013, Distributed Computing.

[14] Nancy A. Lynch,et al. A coded shared atomic memory algorithm for message passing architectures , 2014, 2014 IEEE 13th International Symposium on Network Computing and Applications.

[15] Leslie Lamport,et al. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[16] Marcos K. Aguilera,et al. Using erasure codes efficiently for storage in a distributed system , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[17] Nihar B. Shah,et al. Fundamental limits on communication for oblivious updates in storage networks , 2014, 2014 IEEE Global Communications Conference.

[18] Mahadev Konar,et al. ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[19] Zhiying Wang,et al. On multi-version coding for distributed storage , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[20] Nancy A. Lynch,et al. Storage-Optimized Data-Atomic Algorithms for Handling Erasures and Errors in Distributed Storage Systems , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[21] Nancy A. Lynch,et al. Information-Theoretic Lower Bounds on the Storage Cost of Shared Memory Emulation , 2016, PODC.

[22] Ghassan O. Karame,et al. PoWerStore: proofs of writing for efficient and robust storage , 2012, CCS.

[23] Alexandros G. Dimakis,et al. Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[24] Nancy A. Lynch,et al. RADON: Repairable Atomic Data Object in Networks , 2016, OPODIS.

[25] Michael K. Reiter,et al. Fault-scalable Byzantine fault-tolerant services , 2005, SOSP '05.

[26] Frédérique E. Oggier,et al. Compressed Differential Erasure Codes for Efficient Archival of Versioned Data , 2015, ArXiv.

[27] Stefano Tessaro,et al. Optimal Resilience for Erasure-Coded Byzantine Distributed Storage , 2005, International Conference on Dependable Systems and Networks (DSN'06).

[28] Rachid Guerraoui,et al. Optimistic Erasure-Coded Distributed Storage , 2008, DISC.

[29] Jeff Carpenter,et al. Cassandra: The Definitive Guide , 2010 .

[30] Michael K. Reiter,et al. Efficient Byzantine-tolerant erasure-coded storage , 2004, International Conference on Dependable Systems and Networks, 2004.

[31] Muriel Médard,et al. Communication Cost for Updating Linear Functions When Message Updates are Sparse: Connections to Maximally Recoverable Codes , 2018, IEEE Transactions on Information Theory.

[32] J. Chris Anderson,et al. CouchDB: The Definitive Guide , 2010 .

[33] Robert Griesemer,et al. Paxos made live: an engineering perspective , 2007, PODC '07.

[34] Heng Zhang,et al. Efficient and Available In-Memory KV-Store with Hybrid Erasure Coding and Replication , 2016, FAST.

[35] Viveck R. Cadambe,et al. Consistent distributed storage of correlated data updates via multi-version coding , 2016, 2016 IEEE Information Theory Workshop (ITW).

[36] Chao Tian. Characterizing the Rate Region of the (4,3,3) Exact-Repair Regenerating Codes , 2014, IEEE Journal on Selected Areas in Communications.

[37] S. Nash,et al. Linear and Nonlinear Optimization , 2008 .

[38] Gregory W. Wornell,et al. Update efficient codes for error correction , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[39] Werner Vogels,et al. Dynamo: amazon's highly available key-value store , 2007, SOSP.

[40] Arif Merchant,et al. FAB: building distributed enterprise disk arrays from commodity components , 2004, ASPLOS XI.

[41] Maurice Herlihy,et al. Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[42] Michael K. Reiter,et al. Low-overhead byzantine fault-tolerant storage , 2007, SOSP.

[43] Han Mao Kiah,et al. Synchronizing edits in distributed storage networks , 2014, 2015 IEEE International Symposium on Information Theory (ISIT).

[44] Nancy A. Lynch,et al. Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[45] David Mosberger,et al. Memory consistency models , 1993, OPSR.

[46] Werner Vogels,et al. Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability. , 2022 .

[47] Sriram Vishwanath,et al. Update efficient codes for distributed storage , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[48] Nancy A. Lynch,et al. Hierarchical correctness proofs for distributed algorithms , 1987, PODC '87.