Understanding system characteristics of online erasure coding on scalable, distributed and large-scale SSD array systems

Large-scale systems with arrays of solid state disks (SSDs) have become increasingly common in many computing segments. To make such systems resilient, we can adopt erasure coding such as Reed-Solomon (RS) code as an alternative to replication because erasure coding can offer a significantly lower storage cost than replication. To understand the impact of using erasure coding on system performance and other system aspects such as CPU utilization and network traffic, we build a storage cluster consisting of approximately one hundred processor cores with more than fifty high-performance SSDs, and evaluate the cluster with a popular open-source distributed parallel file system, Ceph. Then we analyze behaviors of systems adopting erasure coding from the following five viewpoints, compared with those of systems using replication: (1) storage system I/O performance; (2) computing and software overheads; (3) I/O amplification; (4) network traffic among storage nodes; (5) the impact of physical data layout on performance of RS-coded SSD arrays. For all these analyses, we examine two representative RS configurations, which are used by Google and Facebook file systems, and compare them with triple replication that a typical parallel file system employs as a default fault tolerance mechanism. Lastly, we collect 54 block-level traces from the cluster and make them available for other researchers.

[1]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[2]  Lluis Pamies-Juarez,et al.  CORE: Cross-object redundancy for efficient data repair in storage systems , 2013, 2013 IEEE International Conference on Big Data.

[3]  Mohammad Arjomand,et al.  Exploring the Potentials of Parallel Garbage Collection in SSDs for Enterprise Storage Systems , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[4]  Andrew S. Tanenbaum,et al.  Distributed systems: Principles and Paradigms , 2001 .

[5]  Jérôme Lacan,et al.  Systematic MDS erasure codes based on Vandermonde matrices , 2004, IEEE Communications Letters.

[6]  Dimitris S. Papailiopoulos,et al.  Simple regenerating codes: Network coding for cloud storage , 2011, 2012 Proceedings IEEE INFOCOM.

[7]  Yang Tang,et al.  NCCloud: applying network coding for the storage repair in a cloud-of-clouds , 2012, FAST.

[8]  Yunnan Wu,et al.  A Survey on Network Codes for Distributed Storage , 2010, Proceedings of the IEEE.

[9]  Kannan Ramchandran,et al.  Having Your Cake and Eating It Too: Jointly Optimal Erasure Codes for I/O, Storage, and Network-bandwidth , 2015, FAST.

[10]  Saurabh Bagchi,et al.  Partial-parallel-repair (PPR): a distributed technique for repairing erasure coded storage , 2016, EuroSys.

[11]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[12]  Lakshmi Ganesh,et al.  Lazy Means Smart: Reducing Repair Bandwidth Costs in Erasure-coded Distributed Storage , 2014, SYSTOR 2014.

[13]  Van-Anh Truong,et al.  Availability in Globally Distributed Storage Systems , 2010, OSDI.

[14]  S.A. Brandt,et al.  CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[15]  Mahmut T. Kandemir,et al.  Revisiting widely held SSD expectations and rethinking system-level implications , 2013, SIGMETRICS '13.

[16]  Myoungsoo Jung,et al.  Exploring Design Challenges in Getting Solid State Drives Closer to CPU , 2016, IEEE Transactions on Computers.

[17]  Carlos Maltzahn,et al.  RADOS: a scalable, reliable storage service for petabyte-scale storage clusters , 2007, PDSW '07.

[18]  Kannan Ramchandran,et al.  EC-Cache: Load-Balanced, Low-Latency Cluster Caching with Online Erasure Coding , 2016, OSDI.

[19]  John Shalf,et al.  OpenNVM: An open-sourced FPGA-based NVM controller for low level memory characterization , 2015, 2015 33rd IEEE International Conference on Computer Design (ICCD).

[20]  Parampalli Udaya,et al.  Benchmarking the performance of hadoop triple replication and erasure coding on a nation-wide distributed cloud , 2015, 2015 International Symposium on Network Coding (NetCod).

[21]  Joachim Rosenthal,et al.  Maximum Distance Separable Convolutional Codes , 1999, Applicable Algebra in Engineering, Communication and Computing.

[22]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[23]  Kannan Ramchandran,et al.  A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster , 2013, HotStorage.

[24]  Xiaodong Zhang,et al.  Understanding intrinsic characteristics and system implications of flash memory based solid state drives , 2009, SIGMETRICS '09.

[25]  Albert G. Greenberg,et al.  The cost of a cloud: research problems in data center networks , 2008, CCRV.

[26]  F. Moore,et al.  Polynomial Codes Over Certain Finite Fields , 2017 .

[27]  GhemawatSanjay,et al.  The Google file system , 2003 .

[28]  Evangelos Eleftheriou,et al.  Write amplification analysis in flash-based solid state drives , 2009, SYSTOR '09.