A Layered Architecture for Erasure-Coded Consistent Distributed Storage

Motivated by emerging applications to the edge computing paradigm, we introduce a two-layer erasure-coded fault-tolerant distributed storage system offering atomic access for read and write operations. In edge computing, clients interact with an edge-layer of servers that is geographically near; the edge-layer in turn interacts with a back-end layer of servers. The edge-layer provides low latency access and temporary storage for client operations, and uses the back-end layer for persistent storage. Our algorithm, termed Layered Data Storage (LDS) algorithm, offers several features suitable for edge-computing systems, works under asynchronous message-passing environments, supports multiple readers and writers, and can tolerate f1 < n1/2 and f2 < n2/3 crash failures in the two layers having n1 and n2 servers, respectively. We use a class of erasure codes known as regenerating codes for storage of data in the back-end layer. The choice of regenerating codes, instead of popular choices like Reed-Solomon codes, not only optimizes the cost of back-end storage, but also helps in optimizing communication cost of read operations, when the value needs to be recreated all the way from the back-end. The two-layer architecture permits a modular implementation of atomicity and erasure-code protocols; the implementation of erasure-codes is mostly limited to interaction between the two layers. We prove liveness and atomicity of LDS, and also compute performance costs associated with read and write operations. In a system with n1 = Θ(n2), f1 = Θ(n1), f2 = Θ(n2), the write and read costs are respectively given by Θ(n1) and Θ(1) + n1 I(δ > 0). Here δ is a parameter closely related to the number of write operations that are concurrent with the read operation, and I(δ > 0) is 1 if δ > 0, and 0 if δ = 0. The cost of persistent storage in the back-end layer is Θ(1). The impact of temporary storage is minimally felt in a multi-object system running N independent instances of LDS, where only a small fraction of the objects undergo concurrent accesses at any point during the execution. For the multi-object system, we identify a condition on the rate of concurrent writes in the system such that the overall storage cost is dominated by that of persistent storage in the back-end layer, and is given by Θ(N).

[1]  Weisong Shi,et al.  Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[2]  Hagit Attiya,et al.  Sharing memory robustly in message-passing systems , 1990, PODC '90.

[3]  Yunnan Wu,et al.  A Survey on Network Codes for Distributed Storage , 2010, Proceedings of the IEEE.

[4]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[5]  Nancy A. Lynch,et al.  Efficient Replication of Large Data Objects , 2003, DISC.

[6]  Kannan Ramchandran,et al.  Having Your Cake and Eating It Too: Jointly Optimal Erasure Codes for I/O, Storage, and Network-bandwidth , 2015, FAST.

[7]  Tracey Ho,et al.  A Random Linear Network Coding Approach to Multicast , 2006, IEEE Transactions on Information Theory.

[8]  Dave Evans,et al.  How the Next Evolution of the Internet Is Changing Everything , 2011 .

[9]  Nancy A. Lynch,et al.  RAMBO: A Reconfigurable Atomic Memory Service for Dynamic Networks , 2002, DISC.

[10]  Michael K. Reiter,et al.  Low-overhead byzantine fault-tolerant storage , 2007, SOSP.

[11]  Nancy A. Lynch,et al.  A coded shared atomic memory algorithm for message passing architectures , 2014, 2014 IEEE 13th International Symposium on Network Computing and Applications.

[12]  Marcos K. Aguilera,et al.  Using erasure codes efficiently for storage in a distributed system , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[13]  Ramesh K. Sitaraman,et al.  The Akamai network: a platform for high-performance internet applications , 2010, OPSR.

[14]  Stefano Tessaro,et al.  Optimal Resilience for Erasure-Coded Byzantine Distributed Storage , 2005, International Conference on Dependable Systems and Networks (DSN'06).

[15]  Rachid Guerraoui,et al.  The collective memory of amnesic processes , 2008, TALG.

[16]  Nihar B. Shah,et al.  Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction , 2010, IEEE Transactions on Information Theory.

[17]  F. Moore,et al.  Polynomial Codes Over Certain Finite Fields , 2017 .

[18]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[19]  Rachid Guerraoui,et al.  Optimistic Erasure-Coded Distributed Storage , 2008, DISC.

[20]  Marcos K. Aguilera,et al.  Dynamic atomic storage without consensus , 2009, PODC '09.

[21]  P. Vijay Kumar,et al.  Evaluation of Codes with Inherent Double Replication for Hadoop , 2014, HotStorage.

[22]  Nancy A. Lynch,et al.  Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[23]  Idit Keidar,et al.  Space Bounds for Reliable Storage: Fundamental Limits of Coding , 2016, PODC.

[24]  Nancy A. Lynch,et al.  RADON: Repairable Atomic Data Object in Networks , 2016, OPODIS.

[25]  Nancy A. Lynch,et al.  Storage-Optimized Data-Atomic Algorithms for Handling Erasures and Errors in Distributed Storage Systems , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[26]  Nancy A. Lynch,et al.  Information-Theoretic Lower Bounds on the Storage Cost of Shared Memory Emulation , 2016, PODC.

[27]  Ghassan O. Karame,et al.  PoWerStore: proofs of writing for efficient and robust storage , 2012, CCS.

[28]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[29]  Raja Lavanya,et al.  Fog Computing and Its Role in the Internet of Things , 2019, Advances in Computer and Electrical Engineering.