CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data

Emerging large-scale distributed storage systems are faced with the task of distributing petabytes of data among tens or hundreds of thousands of storage devices. Such systems must evenly distribute data and workload to efficiently utilize available resources and maximize system performance, while facilitating system growth and managing hardware failures. We have developed CRUSH, a scalable pseudorandom data distribution function designed for distributed object-based storage systems that efficiently maps data objects to storage devices without relying on a central directory. Because large systems are inherently dynamic, CRUSH is designed to facilitate the addition and removal of storage while minimizing unnecessary data movement. The algorithm accommodates a wide variety of data replication and reliability mechanisms and distributes data in terms of user-defined policies that enforce separation of replicas across failure domains

[1]  Richard A. Golding,et al.  D-SPTF: decentralized request distribution in brick-based storage systems , 2004, ASPLOS XI.

[2]  Ohad Rodeh,et al.  zFS - a scalable distributed file system using object disks , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[3]  Christian Scheideler,et al.  Efficient, distributed data placement strategies for storage area networks (extended abstract) , 2000, SPAA '00.

[4]  Ethan L. Miller,et al.  Replication under scalable hashing: a family of algorithms for scalable decentralized data distribution , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[5]  GhemawatSanjay,et al.  The Google file system , 2003 .

[6]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[7]  Joseph Hall,et al.  An Experimental Study of Data Migration Algorithms , 2001, WAE.

[8]  Ronald Fagin,et al.  Efficiently extendible mappings for balanced data distribution , 2005, Algorithmica.

[9]  Noam Rinetzky,et al.  Towards an object store , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[10]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[11]  Tao Yang,et al.  A Self-Organizing Storage Cluster for Parallel Data-Intensive Applications , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[12]  Ashish Goel,et al.  SCADDAR: an efficient randomized technique to reorganize continuous media blocks , 2002, Proceedings 18th International Conference on Data Engineering.

[13]  Eric Anderson,et al.  Proceedings of the Fast 2002 Conference on File and Storage Technologies Hippodrome: Running Circles around Storage Administration , 2022 .

[14]  Ethan L. Miller,et al.  Evaluation of distributed recovery in large-scale storage systems , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[15]  Andrew Granville On Elementary Proofs of the Prime Number Theorem for Arithmetic Progressions, without Characters , 1993 .

[16]  Tao Yang,et al.  The Panasas ActiveScale Storage Cluster - Delivering Scalable High Bandwidth Storage , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[17]  Berthier A. Ribeiro-Neto,et al.  Comparing random data allocation and data striping in multimedia servers , 2000, SIGMETRICS '00.

[18]  Howard Gobioff,et al.  Security for Network Attached Storage Devices , 1997 .

[19]  Arif Merchant,et al.  FAB: building distributed enterprise disk arrays from commodity components , 2004, ASPLOS XI.

[20]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.