Boosting Degraded Reads in Heterogeneous Erasure-Coded Storage Systems

Distributed storage systems provide large-scale data storage services, yet they are confronted with frequent node failures. To ensure data availability, a storage system often introduces data redundancy via replication or erasure coding. As erasure coding incurs significantly less redundancy overhead than replication under the same fault tolerance, it has been increasingly adopted in large-scale storage systems. In erasure-coded storage systems, degraded reads to temporarily unavailable data are very common, and hence boosting the performance of degraded reads becomes important. One challenge is that storage nodes tend to be heterogeneous with different storage capacities and I/O bandwidths. To this end, we propose FastDR, a system that addresses node heterogeneity and exploits I/O parallelism, so as to boost the performance of degraded reads to temporarily unavailable data. FastDR incorporates a greedy algorithm that seeks to reduce the data transfer cost of reading surviving data for degraded reads, while allowing the search of the efficient degraded read solution to be completed in a timely manner. We implement a FastDR prototype, and conduct extensive evaluation through simulation studies as well as testbed experiments on a Hadoop cluster with 10 storage nodes. We demonstrate that our FastDR achieves efficient degraded reads compared to existing approaches.

[1]  Lluis Pamies-Juarez,et al.  The CORE Storage Primitive: Cross-Object Redundancy for Efficient Data Repair & Access in Erasure Coded Storage , 2013, ArXiv.

[2]  Albert G. Greenberg,et al.  Scarlett: coping with skewed content popularity in mapreduce clusters , 2011, EuroSys '11.

[3]  John C. S. Lui,et al.  Single Disk Failure Recovery for X-Code-Based Parallel Storage Systems , 2014, IEEE Transactions on Computers.

[4]  James S. Plank,et al.  AONT-RS: Blending Security and Performance in Dispersed Storage Systems , 2011, FAST.

[5]  Hong Jiang,et al.  WorkOut: I/O Workload Outsourcing for Boosting RAID Reconstruction Performance , 2009, FAST.

[6]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[7]  Catherine D. Schuman,et al.  A Performance Evaluation and Examination of Open-Source Erasure Coding Libraries for Storage , 2009, FAST.

[8]  Cheng Huang,et al.  Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads , 2012, FAST.

[9]  Daniel P. Siewiorek,et al.  Fast, on-line failure recovery in redundant disk arrays , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[10]  James S. Plank A New Minimum Density RAID-6 Code with a Word Size of Eight , 2008, 2008 Seventh IEEE International Symposium on Network Computing and Applications.

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  Xin Wang,et al.  Building parallel regeneration trees in distributed storage systems with asymmetric links , 2010, 6th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom 2010).

[13]  Marek Karpinski,et al.  An XOR-based erasure-resilient coding scheme , 1995 .

[14]  Cheng Huang,et al.  STAR : An Efficient Coding Scheme for Correcting Triple Storage Node Failures , 2005, IEEE Transactions on Computers.

[15]  James S. Plank,et al.  Minimum density RAID-6 codes , 2011, TOS.

[16]  Xin Wang,et al.  Tree-structured Data Regeneration in Distributed Storage Systems with Regenerating Codes , 2010, 2010 Proceedings IEEE INFOCOM.

[17]  Peter Sobe,et al.  Flexible Parameterization of XOR based Codes for Distributed Storage , 2008, 2008 Seventh IEEE International Symposium on Network Computing and Applications.

[18]  Patrick P. C. Lee,et al.  On the speedup of single-disk failure recovery in XOR-coded storage systems: Theory and practice , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[19]  Gregory R. Ganger,et al.  Ursa minor: versatile cluster-based storage , 2005, FAST'05.

[20]  Van-Anh Truong,et al.  Availability in Globally Distributed Storage Systems , 2010, OSDI.

[21]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[22]  Hong Jiang,et al.  PRO: A Popularity-based Multi-threaded Reconstruction Optimization for RAID-Structured Storage Systems , 2007, FAST.

[23]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[24]  Ethan L. Miller,et al.  Evaluation of distributed recovery in large-scale storage systems , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[25]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[26]  Srikanth Kandula,et al.  Leveraging endpoint flexibility in data-intensive clusters , 2013, SIGCOMM.

[27]  Ethan L. Miller,et al.  Reliability of flat XOR-based erasure codes on heterogeneous devices , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[28]  Andrea C. Arpaci-Dusseau,et al.  Association Proceedings of the Third USENIX Conference on File and Storage Technologies San Francisco , CA , USA March 31 – April 2 , 2004 , 2004 .

[29]  Sujata Banerjee,et al.  Measuring Bandwidth Between PlanetLab Nodes , 2005, PAM.

[30]  GhemawatSanjay,et al.  The Google file system , 2003 .

[31]  Patrick P. C. Lee,et al.  A cost-based heterogeneous recovery scheme for distributed storage systems with RAID-6 codes , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[32]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[33]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[34]  Alexandros G. Dimakis,et al.  Rebuilding for array codes in distributed storage systems , 2010, 2010 IEEE Globecom Workshops.

[35]  Xiaozhou Li,et al.  Flat XOR-based erasure codes in storage systems: Constructions, efficient recovery, and tradeoffs , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[36]  A GibsonGarth,et al.  Architectures and algorithms for on-line failure recovery in redundant disk arrays , 1994 .

[37]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[38]  Daniel P. Siewiorek,et al.  Architectures and algorithms for on-line failure recovery in redundant disk arrays , 1994, Distributed and Parallel Databases.

[39]  Jian Lin,et al.  CORE: Augmenting regenerating-coding-based recovery for single and concurrent failures in distributed storage systems , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[40]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[41]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[42]  John C. S. Lui,et al.  A Hybrid Approach to Failed Disk Recovery Using RAID-6 Codes: Algorithms and Performance Evaluation , 2011, TOS.

[43]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.