Parallelizing Degraded Read for Erasure Coded Cloud Storage Systems Using Collective Communications

For lower storage costs, storage systems are increasingly transitioning to the use of erasure codes instead of replication. However, the increase in the amount of data to be read and transferred during recovery for an erasure-coded system results in the problem of high degraded read latency. We design a new parallel degraded read method, Collective Reconstruction Read, which aims to overcome the problem of high degraded read latency of erasure coding by utilizing parallel reconstruction. By introducing collective communication operations (e.g. all-to-one reduction and all-to-all reduction) into distributed storage systems, data reading, transferring and decoding are preformed by all of the involved data nodes in parallel rather than the client itself. Therefore, the time complexity of the degraded read operation is reduced from linear time to logarithmic time. We implement Collective Reconstruction Read in HDFSRAID and evaluate it as the block size and stripe size vary. We find that these algorithms can reduce degraded read latency significantly, thereby improving system availability. Specifically, experimental results indicate an approximate 55% to 81% round off drop in degraded read latency.

[1]  Fred Douglis,et al.  RAIDShield: Characterizing, Monitoring, and Proactively Protecting Against Disk Failures , 2015, FAST.

[2]  F. Moore,et al.  Polynomial Codes Over Certain Finite Fields , 2017 .

[3]  Cheng Huang,et al.  Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads , 2012, FAST.

[4]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[5]  Kannan Ramchandran,et al.  A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers , 2014 .

[6]  GhemawatSanjay,et al.  The Google file system , 2003 .

[7]  Alexandros G. Dimakis,et al.  Rebuilding for array codes in distributed storage systems , 2010, 2010 IEEE Globecom Workshops.

[8]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[9]  Jun Liu,et al.  Fatman: Cost-saving and reliable archival storage based on volunteer resources , 2014, Proc. VLDB Endow..

[10]  Kannan Ramchandran,et al.  A "hitchhiker's" guide to fast and efficient data reconstruction in erasure-coded data centers , 2015, SIGCOMM 2015.

[11]  Hong Jiang,et al.  Proactive Data Migration for Improved Storage Availability in Large-Scale Data Centers , 2015, IEEE Transactions on Computers.

[12]  J. Sikora Disk failures in the real world : What does an MTTF of 1 , 000 , 000 hours mean to you ? , 2007 .

[13]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[14]  Gang Wang,et al.  A Proactive Fault Tolerance Scheme for Large Scale Storage Systems , 2015, ICA3PP.

[15]  John C. S. Lui,et al.  Optimal recovery of single disk failure in RDP code storage systems , 2010, SIGMETRICS '10.

[16]  James S. Plank A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems , 1997 .

[17]  Saurabh Bagchi,et al.  Partial-parallel-repair (PPR): a distributed technique for repairing erasure coded storage , 2016, EuroSys.

[18]  Andrea C. Arpaci-Dusseau,et al.  An analysis of data corruption in the storage stack , 2008, TOS.