Adaptive Multi-copy Layout Algorithm Based on Mass Storage System

Large-scale storage systems face significant challenges in reliability and adaptability, thus it needs reliable, adaptive and effective data layout algorithms. Existing studies only partially meet these goals. This paper first puts forward a reliable copy data layout algorithm (RCDL) and an effective adaptive data layout algorithm (ADL), and on this basis, by combining the two algorithms, this paper proposes a multi-copy adaptive data layout algorithm MCADL, which can achieve better reliability, adaptability and effectiveness. The RCDL distributes the same copies to different storage devices to avoid the same replica on adjacent storage devices, thus obtaining higher redundancy and fault tolerance. The ADL algorithm combines the clustering algorithm with the consistent hash method, and introduces a small amount of virtual storage devices, greatly reducing the consumption of storage space. Data are distributed fairly according to the weights of the storage devices, so it is adaptive to system expansion and reduction. In order to utilize the respective advantages of RCDL and ADL, MCADL divides data into hot and cold data according to the data access frequency. RCDL layout is used for hot data and ADL layout is used for cold data. Theoretical and experimental results show that MCADL can obtain higher redundancy and fault tolerance and can fairly distribute data and add and remove adaptive storage devices according to the weights of storage devices, migrate optimal data amount when the scale of the storage system changes, and can quickly locate data, consuming less storage space.

[1]  S.A. Brandt,et al.  CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[2]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[3]  Friedhelm Meyer auf der Heide,et al.  Dynamic and Redundant Data Placement , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[4]  Ethan L. Miller,et al.  A fast algorithm for online placement and reorganization of replicated data , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[5]  Alan M. Frieze,et al.  Improved Approximation Algorithms for MAX k-CUT and MAX BISECTION , 1995, IPCO.

[6]  Christian Scheideler,et al.  Compact, adaptive placement schemes for non-uniform requirements , 2002, SPAA '02.

[7]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[8]  Ethan L. Miller,et al.  Replication under scalable hashing: a family of algorithms for scalable decentralized data distribution , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[9]  Antony I. T. Rowstron,et al.  Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility , 2001, SOSP.

[10]  Efficient Reliable Internet Storage ∗ , 2004 .

[11]  David R. Karger,et al.  Wide-area cooperative storage with CFS , 2001, SOSP.

[12]  Carlos Maltzahn,et al.  RADOS: a scalable, reliable storage service for petabyte-scale storage clusters , 2007, PDSW '07.