论文信息 - Understanding system characteristics of online erasure coding on scalable, distributed and large-scale SSD array systems

Understanding system characteristics of online erasure coding on scalable, distributed and large-scale SSD array systems

Large-scale systems with arrays of solid state disks (SSDs) have become increasingly common in many computing segments. To make such systems resilient, we can adopt erasure coding such as Reed-Solomon (RS) code as an alternative to replication because erasure coding can offer a significantly lower storage cost than replication. To understand the impact of using erasure coding on system performance and other system aspects such as CPU utilization and network traffic, we build a storage cluster consisting of approximately one hundred processor cores with more than fifty high-performance SSDs, and evaluate the cluster with a popular open-source distributed parallel file system, Ceph. Then we analyze behaviors of systems adopting erasure coding from the following five viewpoints, compared with those of systems using replication: (1) storage system I/O performance; (2) computing and software overheads; (3) I/O amplification; (4) network traffic among storage nodes; (5) the impact of physical data layout on performance of RS-coded SSD arrays. For all these analyses, we examine two representative RS configurations, which are used by Google and Facebook file systems, and compare them with triple replication that a typical parallel file system employs as a default fault tolerance mechanism. Lastly, we collect 54 block-level traces from the cluster and make them available for other researchers.

[1] Cheng Huang,et al. Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[2] Lluis Pamies-Juarez,et al. CORE: Cross-object redundancy for efficient data repair in storage systems , 2013, 2013 IEEE International Conference on Big Data.

[3] Mohammad Arjomand,et al. Exploring the Potentials of Parallel Garbage Collection in SSDs for Enterprise Storage Systems , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[4] Andrew S. Tanenbaum,et al. Distributed systems: Principles and Paradigms , 2001 .

[5] Jérôme Lacan,et al. Systematic MDS erasure codes based on Vandermonde matrices , 2004, IEEE Communications Letters.

[6] Dimitris S. Papailiopoulos,et al. Simple regenerating codes: Network coding for cloud storage , 2011, 2012 Proceedings IEEE INFOCOM.

[7] Yang Tang,et al. NCCloud: applying network coding for the storage repair in a cloud-of-clouds , 2012, FAST.

[8] Yunnan Wu,et al. A Survey on Network Codes for Distributed Storage , 2010, Proceedings of the IEEE.

[9] Kannan Ramchandran,et al. Having Your Cake and Eating It Too: Jointly Optimal Erasure Codes for I/O, Storage, and Network-bandwidth , 2015, FAST.

[10] Saurabh Bagchi,et al. Partial-parallel-repair (PPR): a distributed technique for repairing erasure coded storage , 2016, EuroSys.

[11] Carlos Maltzahn,et al. Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[12] Lakshmi Ganesh,et al. Lazy Means Smart: Reducing Repair Bandwidth Costs in Erasure-coded Distributed Storage , 2014, SYSTOR 2014.

[13] Van-Anh Truong,et al. Availability in Globally Distributed Storage Systems , 2010, OSDI.

[14] S.A. Brandt,et al. CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[15] Mahmut T. Kandemir,et al. Revisiting widely held SSD expectations and rethinking system-level implications , 2013, SIGMETRICS '13.

[16] Myoungsoo Jung,et al. Exploring Design Challenges in Getting Solid State Drives Closer to CPU , 2016, IEEE Transactions on Computers.

[17] Carlos Maltzahn,et al. RADOS: a scalable, reliable storage service for petabyte-scale storage clusters , 2007, PDSW '07.

[18] Kannan Ramchandran,et al. EC-Cache: Load-Balanced, Low-Latency Cluster Caching with Online Erasure Coding , 2016, OSDI.

[19] John Shalf,et al. OpenNVM: An open-sourced FPGA-based NVM controller for low level memory characterization , 2015, 2015 33rd IEEE International Conference on Computer Design (ICCD).

[20] Parampalli Udaya,et al. Benchmarking the performance of hadoop triple replication and erasure coding on a nation-wide distributed cloud , 2015, 2015 International Symposium on Network Coding (NetCod).

[21] Joachim Rosenthal,et al. Maximum Distance Separable Convolutional Codes , 1999, Applicable Algebra in Engineering, Communication and Computing.

[22] Dimitris S. Papailiopoulos,et al. XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[23] Kannan Ramchandran,et al. A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster , 2013, HotStorage.

[24] Xiaodong Zhang,et al. Understanding intrinsic characteristics and system implications of flash memory based solid state drives , 2009, SIGMETRICS '09.

[25] Albert G. Greenberg,et al. The cost of a cloud: research problems in data center networks , 2008, CCRV.

[26] F. Moore,et al. Polynomial Codes Over Certain Finite Fields , 2017 .

[27] GhemawatSanjay,et al. The Google file system , 2003 .

[28] Evangelos Eleftheriou,et al. Write amplification analysis in flash-based solid state drives , 2009, SYSTOR '09.