A new reliability model in replication-based big data storage systems

Abstract Reliability is a critical metric in the design and development of replication-based big data storage systems such as Hadoop File System (HDFS). In the system with thousands of machines and storage devices, even in-frequent failures become likely. In Google File System, the annual disk failure rate is 2.88%, which means that you were expected to see 8760 disk failures in a year. Unfortunately, given an increasing number of node failures, how often a cluster starts losing data when being scaled out is not well investigated. Moreover, there is no systemic method that can be used to quantify the reliability for multi-way replication based data placement methods, which has been widely used in enterprise large-scale storage systems to improve the I/O parallelism. In this paper, we develop a new reliability model by incorporating the probability of replica loss to investigate the system reliability of multi-way declustering data layouts and analyze their potential parallel recovery possibilities. Our comprehensive simulation results on Matlab and SHARPE show that the shifted declustering data layout outperforms the random declustering layout in a multi-way replication scale-out architecture, in terms of data loss probability and system reliability by up to 63% and 85%, respectively. Our study on both 5-year and 10-year system reliability equipped with various recovery bandwidth settings shows that the shifted declustering layout surpasses the two baseline approaches in both cases by consuming up to 79% and 87% less recovery bandwidth for copyset, as well as 4.8% and 10.2% less recovery bandwidth for random layout.

[1]  Ali Saman Tosun Analysis and Comparison of Replicated Declustering Schemes , 2007, IEEE Transactions on Parallel and Distributed Systems.

[2]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[3]  Tom W. Keller,et al.  A comparison of high-availability media recovery techniques , 1989, SIGMOD '89.

[4]  Ernst W. Biersack,et al.  Modeling and Performance Comparison of Reliability Strategies for Distributed Video Servers , 2000, IEEE Trans. Parallel Distributed Syst..

[5]  Donald F. Towsley,et al.  A Performance Evaluation of RAID Architectures , 1996, IEEE Trans. Computers.

[6]  Jun Wang,et al.  Shifted declustering: a placement-ideal layout scheme for multi-way replication storage architecture , 2008, ICS '08.

[7]  Ethan L. Miller,et al.  Replication under scalable hashing: a family of algorithms for scalable decentralized data distribution , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[8]  Bianca Schroeder,et al.  Understanding disk failure rates: What does an MTTF of 1,000,000 hours mean to you? , 2007, TOS.

[9]  Robert J. Chansler,et al.  Data Availability and Durability with the Hadoop Distributed File System , 2012, login Usenix Mag..

[10]  Randolph Nelson,et al.  Probability, Stochastic Processes, and Queueing Theory , 1995 .

[11]  David J. DeWitt,et al.  Chained declustering: a new availability strategy for multiprocessor database machines , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[12]  Peter J. Varman,et al.  pClock: an arrival curve based approach for QoS guarantees in shared storage systems , 2007, SIGMETRICS '07.

[13]  Sachin Katti,et al.  Copysets: Reducing the Frequency of Data Loss in Cloud Storage , 2013, USENIX Annual Technical Conference.

[14]  Miguel Castro,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OPSR.

[15]  Mario Blaum,et al.  Mirrored Disk Organization Reliability Analysis , 2006, IEEE Transactions on Computers.

[16]  Philip S. Yu,et al.  Using rotational mirrored declustering for replica placement in a disk-array-based video server , 1997, Multimedia Systems.

[17]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[18]  Ethan L. Miller,et al.  Evaluation of distributed recovery in large-scale storage systems , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[19]  Bruce G. Lindsay,et al.  Random sampling techniques for space efficient online computation of order statistics of large datasets , 1999, SIGMOD '99.

[20]  T. S. Eugene Ng,et al.  Understanding the effects and implications of compute node related failures in hadoop , 2012, HPDC '12.

[21]  Garth A. Gibson,et al.  Parity declustering for continuous operation in redundant disk arrays , 1992, ASPLOS V.

[22]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[23]  Scott A. Brandt,et al.  Reliability mechanisms for very large storage systems , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[24]  Yuanyuan Yang,et al.  Reliability Analysis on Shifted and Random Declustering Block Layouts in Scale-Out Storage Architectures , 2014, 2014 9th IEEE International Conference on Networking, Architecture, and Storage.

[25]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[26]  Bin Zhou,et al.  Scalable Performance of the Panasas Parallel File System , 2008, FAST.