Scale-out Edge Storage Systems with Embedded Storage Nodes to Get Better Availability and Cost-Efficiency At the Same Time

In the resource-rich environment of data centers most failures can quickly failover to redundant resources. In contrast, failure in edge infrastructures with limited resources might require maintenance personnel to drive to the location in order to fix the problem. The operational cost of these“truck rolls” to locations at the edge infrastructure competes with the operational cost incurred by extra space and power needed for redundant resources at the edge. Computational storage devices with network interfaces can act as network-attached storage servers and offer a new design point for storage systems at the edge. In this paper we hypothesize that a system consisting of a larger number of such small “embedded” storage nodes provides higher availability due to a larger number of failure domains while also saving operational cost in terms of space and power. As evidence for our hypothesis, we compared the possibility of data loss between two different types of storage systems: one is constructed with general-purpose servers, and the other one is constructed with embedded storage nodes. Our results show that the storage system constructed with general-purpose servers has 7 to 20 times higher risk of losing data over the storage system constructed with embedded storage devices. We also compare the two alternatives in terms of power and space using the Media-Based Work Unit (MBWU) that we developed in an earlier paper as a reference point.

[1]  Sachin Katti,et al.  Copysets: Reducing the Frequency of Data Loss in Cloud Storage , 2013, USENIX Annual Technical Conference.

[2]  Kashi Venkatesh Vishwanath,et al.  Characterizing cloud computing hardware reliability , 2010, SoCC '10.

[3]  James S. Plank,et al.  Mean Time to Meaningless: MTTDL, Markov Models, and Storage System Reliability , 2010, HotStorage.

[4]  Weisong Shi,et al.  Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[5]  Weisong Shi,et al.  The Promise of Edge Computing , 2016, Computer.

[6]  Haiying Shen,et al.  A Low-Cost Multi-failure Resilient Replication Scheme for High Data Availability in Cloud Storage , 2016, 2016 IEEE 23rd International Conference on High Performance Computing (HiPC).

[7]  Zheng Song,et al.  Reliable and efficient mobile edge computing in highly dynamic and volatile environments , 2017, 2017 Second International Conference on Fog and Mobile Edge Computing (FMEC).

[8]  Paul Wood,et al.  Dependability in edge computing , 2017, Commun. ACM.

[9]  Jun Wang,et al.  A new reliability model in replication-based big data storage systems , 2017, J. Parallel Distributed Comput..

[10]  S. S. Venkata,et al.  Distribution System Reliability Assessment Using Hierarchical Markov Modeling , 1996, IEEE Power Engineering Review.

[11]  George Pallis,et al.  Content Delivery Networks: Status and Trends , 2003, IEEE Internet Comput..

[12]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2009, SOSP '09.

[13]  Carlos Maltzahn,et al.  MBWU: Benefit Quantification for Data Access Function Offloading , 2019, ISC Workshops.

[14]  Vijay Janapa Reddi,et al.  Mobile CPU's rise to power: Quantifying the impact of generational mobile CPU design trends on performance, energy, and user satisfaction , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[15]  Christopher B. Hauser,et al.  Reliability and Availability Properties of Distributed Database Systems , 2014, 2014 IEEE 18th International Enterprise Distributed Object Computing Conference.

[16]  Jiesheng Wu,et al.  Lessons and Actions: What We Learned from 10K SSD-Related Storage System Failures , 2019, USENIX Annual Technical Conference.