论文信息 - A Layered Architecture for Erasure-Coded Consistent Distributed Storage

A Layered Architecture for Erasure-Coded Consistent Distributed Storage

Motivated by emerging applications to the edge computing paradigm, we introduce a two-layer erasure-coded fault-tolerant distributed storage system offering atomic access for read and write operations. In edge computing, clients interact with an edge-layer of servers that is geographically near; the edge-layer in turn interacts with a back-end layer of servers. The edge-layer provides low latency access and temporary storage for client operations, and uses the back-end layer for persistent storage. Our algorithm, termed Layered Data Storage (LDS) algorithm, offers several features suitable for edge-computing systems, works under asynchronous message-passing environments, supports multiple readers and writers, and can tolerate f1 < n1/2 and f2 < n2/3 crash failures in the two layers having n1 and n2 servers, respectively. We use a class of erasure codes known as regenerating codes for storage of data in the back-end layer. The choice of regenerating codes, instead of popular choices like Reed-Solomon codes, not only optimizes the cost of back-end storage, but also helps in optimizing communication cost of read operations, when the value needs to be recreated all the way from the back-end. The two-layer architecture permits a modular implementation of atomicity and erasure-code protocols; the implementation of erasure-codes is mostly limited to interaction between the two layers. We prove liveness and atomicity of LDS, and also compute performance costs associated with read and write operations. In a system with n1 = Θ(n2), f1 = Θ(n1), f2 = Θ(n2), the write and read costs are respectively given by Θ(n1) and Θ(1) + n1 I(δ > 0). Here δ is a parameter closely related to the number of write operations that are concurrent with the read operation, and I(δ > 0) is 1 if δ > 0, and 0 if δ = 0. The cost of persistent storage in the back-end layer is Θ(1). The impact of temporary storage is minimally felt in a multi-object system running N independent instances of LDS, where only a small fraction of the objects undergo concurrent accesses at any point during the execution. For the multi-object system, we identify a condition on the rate of concurrent writes in the system such that the overall storage cost is dominated by that of persistent storage in the back-end layer, and is given by Θ(N).

[1] Weisong Shi,et al. Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[2] Hagit Attiya,et al. Sharing memory robustly in message-passing systems , 1990, PODC '90.

[3] Yunnan Wu,et al. A Survey on Network Codes for Distributed Storage , 2010, Proceedings of the IEEE.

[4] Prashant Malik,et al. Cassandra: a decentralized structured storage system , 2010, OPSR.

[5] Nancy A. Lynch,et al. Efficient Replication of Large Data Objects , 2003, DISC.

[6] Kannan Ramchandran,et al. Having Your Cake and Eating It Too: Jointly Optimal Erasure Codes for I/O, Storage, and Network-bandwidth , 2015, FAST.

[7] Tracey Ho,et al. A Random Linear Network Coding Approach to Multicast , 2006, IEEE Transactions on Information Theory.

[8] Dave Evans,et al. How the Next Evolution of the Internet Is Changing Everything , 2011 .

[9] Nancy A. Lynch,et al. RAMBO: A Reconfigurable Atomic Memory Service for Dynamic Networks , 2002, DISC.

[10] Michael K. Reiter,et al. Low-overhead byzantine fault-tolerant storage , 2007, SOSP.

[11] Nancy A. Lynch,et al. A coded shared atomic memory algorithm for message passing architectures , 2014, 2014 IEEE 13th International Symposium on Network Computing and Applications.

[12] Marcos K. Aguilera,et al. Using erasure codes efficiently for storage in a distributed system , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[13] Ramesh K. Sitaraman,et al. The Akamai network: a platform for high-performance internet applications , 2010, OPSR.

[14] Stefano Tessaro,et al. Optimal Resilience for Erasure-Coded Byzantine Distributed Storage , 2005, International Conference on Dependable Systems and Networks (DSN'06).

[15] Rachid Guerraoui,et al. The collective memory of amnesic processes , 2008, TALG.

[16] Nihar B. Shah,et al. Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction , 2010, IEEE Transactions on Information Theory.

[17] F. Moore,et al. Polynomial Codes Over Certain Finite Fields , 2017 .

[18] Dimitris S. Papailiopoulos,et al. XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[19] Rachid Guerraoui,et al. Optimistic Erasure-Coded Distributed Storage , 2008, DISC.

[20] Marcos K. Aguilera,et al. Dynamic atomic storage without consensus , 2009, PODC '09.

[21] P. Vijay Kumar,et al. Evaluation of Codes with Inherent Double Replication for Hadoop , 2014, HotStorage.

[22] Nancy A. Lynch,et al. Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[23] Idit Keidar,et al. Space Bounds for Reliable Storage: Fundamental Limits of Coding , 2016, PODC.

[24] Nancy A. Lynch,et al. RADON: Repairable Atomic Data Object in Networks , 2016, OPODIS.

[25] Nancy A. Lynch,et al. Storage-Optimized Data-Atomic Algorithms for Handling Erasures and Errors in Distributed Storage Systems , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[26] Nancy A. Lynch,et al. Information-Theoretic Lower Bounds on the Storage Cost of Shared Memory Emulation , 2016, PODC.

[27] Ghassan O. Karame,et al. PoWerStore: proofs of writing for efficient and robust storage , 2012, CCS.

[28] Alexandros G. Dimakis,et al. Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[29] Raja Lavanya,et al. Fog Computing and Its Role in the Internet of Things , 2019, Advances in Computer and Electrical Engineering.