Towards Building Reliable and Cost-Efficient Distributed Storage Systems

Reliability and cost are two important targets for distributed storage systems. For many years, numerous schemes have been proposed to improve the reliability or cost of distributed storage systems, and they can be divided into three categories: (1) data redundancy schemes; (2) data placement schemes; and (3) data repair schemes. However, it is still unclear regarding how to build a reliable and cost-efficient distributed storage system, because (i) insufficient considerations on the combinations of different schemes; and (ii) insufficient considerations on failures and recoveries of different subsystems (racks, nodes, disks, and sectors). To measure the reliability and cost caused by different schemes, we design and implement CR-SIM, a Comprehensive Reliability SIMulator for distributed storage systems. It considers various affecting factors, such as the system topology, the data redundancy scheme, the data placement scheme, the data repair scheme, and the failure/recovery models of different subsystems. By using CR-SIM, we conduct various simulation-based experiments, and the experimental results reveal several important findings, which are helpful to build reliable and cost-efficient distributed storage systems. For public use, we have open-sourced our source code at https://github.com/yichuan0707/CR-SIM.

[1]  Anne-Marie Kermarrec,et al.  Availability-Based Methods for Distributed Storage Systems , 2012, 2012 IEEE 31st Symposium on Reliable Distributed Systems.

[2]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[3]  Peter F. Corbett,et al.  Row-Diagonal Parity for Double Disk Failure Correction (Awarded Best Paper!) , 2004, USENIX Conference on File and Storage Technologies.

[4]  Robert J. Hall,et al.  Tools for Predicting the Reliability of Large-Scale Storage Systems , 2016, ACM Trans. Storage.

[5]  Xubin He,et al.  RAFI: Risk-Aware Failure Identification to Improve the RAS in Erasure-coded Data Centers , 2018, USENIX Annual Technical Conference.

[6]  Dimitris S. Papailiopoulos,et al.  Simple regenerating codes: Network coding for cloud storage , 2011, 2012 Proceedings IEEE INFOCOM.

[7]  Michael G. Pecht,et al.  A Highly Accurate Method for Assessing Reliability of Redundant Arrays of Inexpensive Disks (RAID) , 2009, IEEE Transactions on Computers.

[8]  Irene Zhang Reducing the Frequency of Data Loss in Cloud Storage , 2013 .

[9]  Bianca Schroeder,et al.  Understanding disk failure rates: What does an MTTF of 1,000,000 hours mean to you? , 2007, TOS.

[10]  D. M. Chiu,et al.  Erasure code replication revisited , 2004, Proceedings. Fourth International Conference on Peer-to-Peer Computing, 2004. Proceedings..

[11]  Anne-Marie Kermarrec,et al.  Regenerating Codes: A System Perspective , 2012, 2012 IEEE 31st Symposium on Reliable Distributed Systems.

[12]  Bianca Schroeder,et al.  A Large-Scale Study of Failures in High-Performance Computing Systems , 2006, IEEE Transactions on Dependable and Secure Computing.

[13]  Catherine D. Schuman,et al.  A Performance Evaluation and Examination of Open-Source Erasure Coding Libraries for Storage , 2009, FAST.

[14]  Robert Mateescu,et al.  Opening the Chrysalis: On the Real Repair Performance of MSR Codes , 2016, FAST.

[15]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[16]  Minghua Chen,et al.  Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems , 2007, Sixth IEEE International Symposium on Network Computing and Applications (NCA 2007).

[17]  Yang Tang,et al.  NCCloud: applying network coding for the storage repair in a cloud-of-clouds , 2012, FAST.

[18]  Mendel Rosenblum,et al.  Fast crash recovery in RAMCloud , 2011, SOSP.

[19]  K. Gopinath,et al.  Are Markov Models Effective for Storage Reliability Modelling? , 2015, ArXiv.

[20]  Van-Anh Truong,et al.  Availability in Globally Distributed Storage Systems , 2010, OSDI.

[21]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[22]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[23]  Elliot K. Kolodner,et al.  Network Aware Reliability Analysis for Distributed Storage Systems , 2016, 2016 IEEE 35th Symposium on Reliable Distributed Systems (SRDS).

[24]  Sriram Rao,et al.  A The Quantcast File System , 2013, Proc. VLDB Endow..

[25]  Sachin Katti,et al.  Copysets: Reducing the Frequency of Data Loss in Cloud Storage , 2013, USENIX Annual Technical Conference.

[26]  Mi Zhang,et al.  A Simulation Analysis of Reliability in Erasure-Coded Data Centers , 2017, 2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS).

[27]  Komal Shringare,et al.  Apache Hadoop Goes Realtime at Facebook , 2015 .

[28]  GhemawatSanjay,et al.  The Google file system , 2003 .

[29]  Ilias Iliadis,et al.  A General Reliability Model for Data Storage Systems , 2012, 2012 Ninth International Conference on Quantitative Evaluation of Systems.

[30]  Suayb S. Arslan A Reliability Model for Dependent and Distributed MDS Disk Array Units , 2018, IEEE Transactions on Reliability.

[31]  Rodrigo Rodrigues,et al.  High Availability in DHTs: Erasure Coding vs. Replication , 2005, IPTPS.

[32]  James S. Plank,et al.  Mean Time to Meaningless: MTTDL, Markov Models, and Storage System Reliability , 2010, HotStorage.

[33]  Cory Hill,et al.  f4: Facebook's Warm BLOB Storage System , 2014, OSDI.

[34]  Shankar Pasupathy,et al.  An analysis of latent sector errors in disk drives , 2007, SIGMETRICS '07.

[35]  Kannan Ramchandran,et al.  A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers , 2014 .

[36]  F. Moore,et al.  Polynomial Codes Over Certain Finite Fields , 2017 .

[37]  Kannan Ramchandran,et al.  DRESS codes for the storage cloud: Simple randomized constructions , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[38]  Dan Feng,et al.  Optimal Repair Layering for Erasure-Coded Data Centers , 2017, ACM Trans. Storage.

[39]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[40]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[41]  Jehoshua Bruck,et al.  EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures , 1995, IEEE Trans. Computers.

[42]  Hong Jiang,et al.  An Improved Decoding Algorithm for Generalized RDP Codes , 2016, IEEE Communications Letters.

[43]  Kannan Ramchandran,et al.  Having Your Cake and Eating It Too: Jointly Optimal Erasure Codes for I/O, Storage, and Network-bandwidth , 2015, FAST.

[44]  Lakshmi Ganesh,et al.  Lazy Means Smart: Reducing Repair Bandwidth Costs in Erasure-coded Distributed Storage , 2014, SYSTOR 2014.

[45]  Calton Pu,et al.  MICS: Mingling Chained Storage Combining Replication and Erasure Coding , 2015, 2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS).

[46]  Kevin M. Greenan,et al.  Reliability and power-efficiency in erasure-coded storage systems , 2009 .