CPU: Cross-Rack-Aware Pipelining Update for Erasure-Coded Storage

Erasure coding is widely used in distributed storage systems (DSSs) to efficiently achieve fault tolerance. However, when the original data need to be updated, erasure coding must update every encoded block, resulting in long update time and high bandwidth consumption. Exiting solutions are mainly focused on coding schemes to minimize the size of transmitted update information, while ignoring more efficient utilization of bandwidth among update racks. In this paper, we propose a parallel Cross-rack Pipelining Update scheme (CPU), which divides the update information into small-size units and transmits these units in parallel along with an update pipeline path among multiple racks. The performance of CPU is mainly determined by slice size and update path. More slices bring finer-grained parallel transmissions over cross-rack links, but also introduces more overheads. An update path that traverses all racks with large-bandwidth links provide short update time. We formulate the proposed pipelining update scheme as an optimization problem, based on a new theoretical pipelining update model. We prove the optimization problem is NP-hard and develop a heuristic algorithm to solve it based on the features of practical DSSs and our implementations, including Big chunk and Small overhead. Specifically, we determine the best update path first by solving a max-min problem and then decide the slice size. We further simplify the slice size selection by offline learning a range of interesting (RoI), in which all slice sizes provide similar performance. We implement CPU and conduct experiments on Amazon EC2 under a variety of scenarios. The results show that CPU can reduce the average update time by 48.2%, compared with the state-of-the-art update schemes.

[1]  Jun Li,et al.  Demand-Aware Erasure Coding for Distributed Storage Systems , 2018, IEEE Transactions on Cloud Computing.

[2]  Xi Zhao,et al.  DMM: fast map matching for cellular data , 2020, MobiCom.

[3]  Shadi Ibrahim,et al.  Is it Time to Revisit Erasure Coding in Data-Intensive Clusters? , 2019, 2019 IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS).

[4]  Geoffrey G. Xie,et al.  Hadoop MapReduce for Mobile Clouds , 2019, IEEE Transactions on Cloud Computing.

[5]  Roberto Padovani,et al.  Liquid Cloud Storage , 2017, ACM Trans. Storage.

[6]  Masoud Ardakani,et al.  Improving the Update Complexity of Locally Repairable Codes , 2018, IEEE Transactions on Communications.

[7]  Patrick P. C. Lee,et al.  Cross-Rack-Aware Updates in Erasure-Coded Data Centers , 2018, ICPP.

[8]  Khuzaima Daudjee,et al.  EC-Store: Bridging the Gap between Storage and Latency in Distributed Erasure Coded Systems , 2018, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).

[9]  Junsong Yuan,et al.  Traffic-Optimized Data Placement for Social Media , 2018, IEEE Transactions on Multimedia.

[10]  Patrick P. C. Lee,et al.  Repair Pipelining for Erasure-Coded Storage , 2017, USENIX Annual Technical Conference.

[11]  Dan Feng,et al.  Optimal Repair Layering for Erasure-Coded Data Centers , 2017, ACM Trans. Storage.

[12]  Mo Li,et al.  Soft Hint Enabled Adaptive Visible Light Communication over Screen-Camera Links , 2017, IEEE Transactions on Mobile Computing.

[13]  Patrick P. C. Lee,et al.  Enabling Efficient and Reliable Transition from Replication to Erasure Coding for Clustered File Systems , 2015, IEEE Transactions on Parallel and Distributed Systems.

[14]  Dapeng Oliver Wu,et al.  From Rateless to Hopless , 2015, IEEE/ACM Transactions on Networking.

[15]  F. Moore,et al.  Polynomial Codes Over Certain Finite Fields , 2017 .

[16]  Jiwu Shu,et al.  Reconsidering Single Failure Recovery in Clustered File Systems , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[17]  Yijie Wang,et al.  T-Update: A tree-structured update scheme with top-down transmission in erasure-coded systems , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[18]  Mo Li,et al.  When Pipelines Meet Fountain: Fast Data Dissemination in Wireless Sensor Networks , 2015, SenSys.

[19]  Minyi Guo,et al.  TIP-Code: A Three Independent Parity Code to Tolerate Triple Disk Failures with Optimal Update Complextiy , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[20]  Carlo Curino,et al.  Global Analytics in the Face of Bandwidth and Regulatory Constraints , 2015, NSDI.

[21]  Tao Xiang,et al.  Secure cloud storage meets with secure network coding , 2016, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[22]  Cory Hill,et al.  f4: Facebook's Warm BLOB Storage System , 2014, OSDI.

[23]  Yuanyuan Yang,et al.  Applying Network Coding to Peer-to-Peer File Sharing , 2014, IEEE Transactions on Computers.

[24]  Anand Raghunathan,et al.  ShuffleWatcher: Shuffle-aware Scheduling in Multi-tenant MapReduce Clusters , 2014, USENIX Annual Technical Conference.

[25]  Patrick P. C. Lee,et al.  Parity logging with reserved space: towards efficient updates and recovery in erasure-coded clustered storage , 2014, FAST.

[26]  Gregory W. Wornell,et al.  Update-Efficiency and Local Repairability Limits for Capacity Approaching Codes , 2013, IEEE Journal on Selected Areas in Communications.

[27]  Kannan Ramchandran,et al.  A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers , 2014 .

[28]  Sriram Rao,et al.  A The Quantcast File System , 2013, Proc. VLDB Endow..

[29]  Srikanth Kandula,et al.  Leveraging endpoint flexibility in data-intensive clusters , 2013, SIGCOMM.

[30]  Navendu Jain,et al.  An empirical analysis of intra- and inter-datacenter network failures for geo-distributed services , 2013, SIGMETRICS '13.

[31]  Xin Wang,et al.  Cooperative pipelined regeneration in distributed storage systems , 2013, 2013 Proceedings IEEE INFOCOM.

[32]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[33]  Camilla Hollanti,et al.  Capacity and Security of Heterogeneous Distributed Storage Systems , 2013, IEEE Journal on Selected Areas in Communications.

[34]  Jianzhong Huang,et al.  Two Efficient Partial-Updating Schemes for Erasure-Coded Storage Clusters , 2012, 2012 IEEE Seventh International Conference on Networking, Architecture, and Storage.

[35]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[36]  Raghu Ramakrishnan,et al.  bLSM: a general purpose log structured merge tree , 2012, SIGMOD Conference.

[37]  Ethan L. Miller,et al.  Analysis of Workload Behavior in Scientific and Historical Long-Term Data Repositories , 2012, TOS.

[38]  Dimitris S. Papailiopoulos,et al.  Simple regenerating codes: Network coding for cloud storage , 2011, 2012 Proceedings IEEE INFOCOM.

[39]  Sriram Vishwanath,et al.  Update efficient codes for distributed storage , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[40]  Van-Anh Truong,et al.  Availability in Globally Distributed Storage Systems , 2010, OSDI.

[41]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[42]  Xin Wang,et al.  Tree-structured Data Regeneration in Distributed Storage Systems with Regenerating Codes , 2010, 2010 Proceedings IEEE INFOCOM.

[43]  Antony I. T. Rowstron,et al.  Write off-loading: Practical power management for enterprise storage , 2008, TOS.

[44]  Ethan L. Miller,et al.  Pergamum: Replacing Tape with Energy Efficient, Reliable, Disk-Based Archival Storage , 2008, FAST.

[45]  Eduardo Pinheiro,et al.  Failure Trends in a Large Disk Drive Population , 2007, FAST.

[46]  Bianca Schroeder,et al.  Disk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You? , 2007, FAST.

[47]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[48]  Yin Zhang,et al.  On the constancy of internet path properties , 2001, IMW '01.

[49]  James S. Plank,et al.  A tutorial on Reed–Solomon coding for fault‐tolerance in RAID‐like systems , 1997, Softw. Pract. Exp..

[50]  Shivakumar Venkataraman,et al.  The TickerTAIP parallel RAID architecture , 1993, ISCA '93.