Boosting Full-Node Repair in Erasure-Coded Storage

As a common choice for fault tolerance in today’s storage systems, erasure coding is still hampered by the induced substantial traffic in repair. A variety of erasure codes and repair algorithms are designed in recent years to relieve the repair traffic, yet we unveil via careful analysis that they are still plagued by several limitations, which restrict or even negate the performance gains. We present RepairBoost, a scheduling framework that can assist existing linear erasure codes and repair algorithms to boost the full-node repair performance. RepairBoost builds on three design primitives: (i) repair abstraction, which employs a directed acyclic graph to characterize a single-chunk repair process; (ii) repair traffic balancing, which balances the upload and download repair traffic simultaneously; and (iii) transmission scheduling, which carefully dispatches the requested chunks to saturate the most unoccupied bandwidth. Extensive experiments on Amazon EC2 show that RepairBoost can accelerate the repair by 35.0-97.1% for various erasure codes and repair algorithms.

[1]  Nihar B. Shah,et al.  Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction , 2010, IEEE Transactions on Information Theory.

[2]  Syed Hussain,et al.  Clay Codes: Moulding MDS Codes to Yield an MSR Code , 2018, FAST.

[3]  GhemawatSanjay,et al.  The Google file system , 2003 .

[4]  David A. Maltz,et al.  Network traffic characteristics of data centers in the wild , 2010, IMC '10.

[5]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[6]  Srinivasan Seshan,et al.  Scheduling techniques for hybrid circuit/packet networks , 2015, CoNEXT.

[7]  Gang Wang,et al.  Hard Drive Failure Prediction Using Classification and Regression Trees , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[8]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[9]  Anuj Kalia,et al.  Challenges and solutions for fast remote persistent memory access , 2020, SoCC.

[10]  Kai Chen,et al.  URSA: Hybrid Block Storage for Cloud-Scale Virtual Disks , 2019, EuroSys.

[11]  Robbert van Renesse,et al.  An analysis of Facebook photo caching , 2013, SOSP.

[12]  Cheng Huang,et al.  Latency reduction and load balancing in coded storage systems , 2017, SoCC.

[13]  Gregory R. Ganger,et al.  Cluster storage systems gotta have HeART: improving storage efficiency by exploiting disk-reliability heterogeneity , 2019, FAST.

[14]  Patrick P. C. Lee,et al.  Exploiting Combined Locality for Wide-Stripe Erasure Coding in Distributed Storage , 2021, FAST.

[15]  Kannan Ramchandran,et al.  A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster , 2013, HotStorage.

[16]  Cory Hill,et al.  f4: Facebook's Warm BLOB Storage System , 2014, OSDI.

[17]  Robert E. Tarjan,et al.  Data structures and network algorithms , 1983, CBMS-NSF regional conference series in applied mathematics.

[18]  Kannan Ramchandran,et al.  Interference Alignment in Regenerating Codes for Distributed Storage: Necessity and Code Constructions , 2010, IEEE Transactions on Information Theory.

[19]  Robert Mateescu,et al.  Opening the Chrysalis: On the Real Repair Performance of MSR Codes , 2016, FAST.

[20]  Patrick P. C. Lee,et al.  Degraded-First Scheduling for MapReduce in Erasure-Coded Storage Clusters , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[21]  Oleg Kolosov,et al.  On Fault Tolerance, Locality, and Optimality in Locally Repairable Codes , 2020, USENIX Annual Technical Conference.

[22]  Jiwu Shu,et al.  Reconsidering Single Failure Recovery in Clustered File Systems , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[23]  Greg Hamerly,et al.  Bayesian approaches to failure prediction for disk drives , 2001, ICML.

[24]  F. Moore,et al.  Polynomial Codes Over Certain Finite Fields , 2017 .

[25]  Albert G. Greenberg,et al.  Scarlett: coping with skewed content popularity in mapreduce clusters , 2011, EuroSys '11.

[26]  Jasmina Bogojeska,et al.  Predicting Disk Replacement towards Reliable Data Centers , 2016, KDD.

[27]  Patrick P. C. Lee,et al.  Fast Predictive Repair in Erasure-Coded Storage , 2019, 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[28]  Joseph F. Murray,et al.  Machine Learning Methods for Predicting Failures in Hard Drives: A Multiple-Instance Application , 2005, J. Mach. Learn. Res..

[29]  Alon Itai,et al.  The complexity of finding maximum disjoint paths with length constraints , 1982, Networks.

[30]  Patrick P. C. Lee,et al.  OpenEC: Toward Unified and Configurable Erasure Coding Management in Distributed Storage Systems , 2019, FAST.

[31]  Cheng Huang,et al.  Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads , 2012, FAST.

[32]  Xiao Qin,et al.  PUSH: A Pipelined Reconstruction I/Of or Erasure-Coded Storage Clusters , 2015, IEEE Transactions on Parallel and Distributed Systems.

[33]  Dan Feng,et al.  Optimal Repair Layering for Erasure-Coded Data Centers , 2017, ACM Trans. Storage.

[34]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[35]  Dongmei Zhang,et al.  Predicting Node failure in cloud service systems , 2018, ESEC/SIGSOFT FSE.

[36]  Peng Li,et al.  Improving Service Availability of Cloud Systems by Predicting Disk Error , 2018, USENIX ATC.

[37]  Fred Douglis,et al.  RAIDShield: Characterizing, Monitoring, and Proactively Protecting Against Disk Failures , 2015, FAST.

[38]  Weisong Shi,et al.  Making Disk Failure Predictions SMARTer! , 2020, FAST.

[39]  Yingxun Fu,et al.  ClusterSR: Cluster-Aware Scattered Repair in Erasure-Coded Storage , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[40]  Odysseas I. Pentakalos An Introduction to the InfiniBand Architecture , 2002, Int. CMG Conference.

[41]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[42]  Karl-Erwin Großpietsch,et al.  Fault tolerance , 1994, IEEE Micro.

[43]  Ke Zhou,et al.  HDDse: Enabling High-Dimensional Disk State Embedding for Generic Failure Detection System of Heterogeneous Disks in Large Data Centers , 2020, USENIX Annual Technical Conference.

[44]  Steven Swanson,et al.  An Empirical Guide to the Behavior and Use of Scalable Persistent Memory , 2019, FAST.

[45]  Hai Jin,et al.  An Introduction to the InfiniBand Architecture , 2002 .

[46]  Gang Wang,et al.  Proactive drive failure prediction for large scale storage systems , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[47]  Sriram Rao,et al.  A The Quantcast File System , 2013, Proc. VLDB Endow..

[48]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[49]  Zhenhua Liu,et al.  HUG: Multi-Resource Fairness for Correlated and Elastic Demands , 2016, NSDI.

[50]  Bianca Schroeder,et al.  Proactive error prediction to improve storage system reliability , 2017, USENIX ATC.

[51]  Heng Zhang,et al.  Efficient and Available In-Memory KV-Store with Hybrid Erasure Coding and Replication , 2016, FAST.

[52]  Saurabh Bagchi,et al.  Partial-parallel-repair (PPR): a distributed technique for repairing erasure coded storage , 2016, EuroSys.

[53]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[54]  Patrick P. C. Lee,et al.  Repair Pipelining for Erasure-Coded Storage , 2017, USENIX Annual Technical Conference.

[55]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[56]  Kannan Ramchandran,et al.  A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers , 2014 .

[57]  P. Lee,et al.  Toward Adaptive Disk Failure Prediction via Stream Mining , 2020, 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS).