Reliability of Clustered vs. Declustered Replica Placement in Data Storage Systems

The placement of replicas across storage nodes in a replication-based storage system is known to affect rebuild times and therefore system reliability. Earlier work has shown that, for a replication factor of two, the reliability is essentially unaffected by the replica placement scheme because all placement schemes have mean times to data loss (MTTDLs) within a factor of two for practical values of the failure rate, storage capacity, and rebuild bandwidth of a storage node. However, for higher replication factors, simulation results reveal that this no longer holds. Moreover, an analytical derivation of MTTDL becomes intractable for general placement schemes. In this paper, we develop a theoretical model that is applicable for any replication factor and provides a good approximation of the MTTDL for small failure rates. This model characterizes the system behavior by using an analytically tractable measure of reliability: the probability of the shortest path to data loss following the first node failure. It is shown that, for highly reliable systems, this measure approximates well the probability of all paths to data loss after the first node failure and prior to the completion of rebuild, and leads to a rough estimation of the MTTDL. The results obtained are of theoretical and practical importance and are confirmed by means of simulations. As our results show, the declustered placement scheme, contrary to intuition, offers a reliability for replication factors greater than two that does not decrease as the number of nodes in the system increases.

[1]  Mario Blaum,et al.  Higher reliability redundant disk arrays: Organization, operation, and coding , 2009, TOS.

[2]  Ethan L. Miller,et al.  Evaluation of distributed recovery in large-scale storage systems , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[3]  Van-Anh Truong,et al.  Availability in Globally Distributed Storage Systems , 2010, OSDI.

[4]  Srinivasan Seshan,et al.  Subtleties in Tolerating Correlated Failures in Wide-area Storage Systems , 2006, NSDI.

[5]  Christina Fragouli,et al.  Effect of Replica Placement on the Reliability of Large-Scale Data Storage Systems , 2010, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[6]  Jim Davies,et al.  A Comparison of Replication Strategies for Reliable Decentralised Storage , 2006, J. Networks.

[7]  Ethan L. Miller,et al.  Reliability of flat XOR-based erasure codes on heterogeneous devices , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[8]  Joseph Pasquale,et al.  Analysis of Long-Running Replicated Systems , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[9]  Arkady Kanevsky,et al.  Are disks the dominant contributor for storage failures?: A comprehensive study of storage subsystem failure characteristics , 2008, TOS.

[10]  James S. Plank,et al.  Mean Time to Meaningless: MTTDL, Markov Models, and Storage System Reliability , 2010, HotStorage.

[11]  Mario Blaum,et al.  Mirrored Disk Organization Reliability Analysis , 2006, IEEE Transactions on Computers.

[12]  Patrick Butler,et al.  On utilization of contributory storage in desktop grids , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[13]  Wei Chen,et al.  On the Impact of Replica Placement to the Reliability of Distributed Brick Storage Systems , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[14]  Scott A. Brandt,et al.  Reliability mechanisms for very large storage systems , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..