An Efficient Data Location Protocol for Self.organizing Storage Clusters

Component additions and failures are common for large-scale storage clusters in production environments. To improve availability and manageability, we investigate and compare data location schemes for a large self-organizing storage cluster that can quickly adapt to the additions or departures of storage nodes. We further present an efficient location scheme that differentiates between small and large file blocks for reduced management overhead compared to uniform strategies. In our protocol, small blocks, which are typically in large quantities, are placed through consistent hashing. Large blocks, much fewer in practice, are placed through a usage-based policy, and their locations are tracked by Bloom filters. The proposed scheme results in improved storage utilization even with non-uniform cluster nodes. To achieve high scalability and fault resilience, this protocol is fully distributed, relies only on soft states, and supports data replication. We demonstrate the effectiveness and efficiency of this protocol through trace-driven simulation.

[1]  Miguel Castro,et al.  Security for Structured Peer-to-peer Overlay Networks , 2004 .

[2]  Margaret Martonosi,et al.  Impala: a middleware system for managing autonomic, parallel sensor systems , 2003, PPoPP '03.

[3]  Ethan L. Miller,et al.  A fast algorithm for online placement and reorganization of replicated data , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[4]  Scott A. Brandt,et al.  Efficient metadata management in large distributed storage systems , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[5]  Ben Y. Zhao,et al.  Awarded Best Student Paper! - Pond: The OceanStore Prototype , 2003 .

[6]  Ben Y. Zhao,et al.  Pond: The OceanStore Prototype , 2003, FAST.

[7]  Kanad Ghose,et al.  yFS: A Journaling File System Design for Handling Large Data Sets with Reduced Seeking , 2003, FAST.

[8]  Miguel Castro,et al.  Secure routing for structured peer-to-peer overlay networks , 2002, OSDI '02.

[9]  Dirk Grunwald,et al.  Massive Arrays of Idle Disks For Storage Archives , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[10]  Ben Y. Zhao,et al.  Distributed Object Location in a Dynamic Network , 2002, SPAA '02.

[11]  Christian Scheideler,et al.  Compact, adaptive placement schemes for non-uniform requirements , 2002, SPAA '02.

[12]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[13]  Amin Vahdat,et al.  Managing energy and server resources in hosting centers , 2001, SOSP.

[14]  David R. Karger,et al.  Wide-area cooperative storage with CFS , 2001, SOSP.

[15]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[16]  Tao Yang,et al.  Neptune: Scalable Replication Management and Programming Support for Cluster-based Network Services , 2001, USITS.

[17]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[18]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[19]  Amin Vahdat,et al.  Interposed request routing for scalable network storage , 2000, TOCS.

[20]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[21]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[22]  Witold Litwin,et al.  LH*RS: a high-availability scalable distributed data structure using Reed Solomon Codes , 2000, SIGMOD '00.

[23]  Werner Vogels,et al.  File system usage in Windows NT 4.0 , 1999, SOSP.

[24]  John H. Hartman,et al.  The Swarm scalable storage system , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[25]  Christopher Whitaker,et al.  A Comparison of Two Distributed Disk Systems , 1998 .

[26]  Noah Treuhaft,et al.  Intelligent RAM (IRAM): the industrial setting, applications, and architectures , 1997, Proceedings International Conference on Computer Design VLSI in Computers and Processors.

[27]  Chandramohan A. Thekkath,et al.  Frangipani: a scalable distributed file system , 1997, SOSP.

[28]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[29]  Joel H. Saltz,et al.  Titan: a high-performance remote-sensing database , 1997, Proceedings 13th International Conference on Data Engineering.

[30]  Witold Litwin,et al.  LH*—a scalable, distributed data structure , 1996, TODS.

[31]  Chandramohan A. Thekkath,et al.  Petal: distributed virtual disks , 1996, ASPLOS VII.

[32]  John H. Hartman,et al.  The Zebra striped network file system , 1995, TOCS.

[33]  Dror G. Feitelson,et al.  Parallel File Systems for the IBM SP Computers , 1995, IBM Syst. J..

[34]  Witold Litwin,et al.  LH* - Linear Hashing for Distributed Files , 1993, SIGMOD Conference.

[35]  Mary Baker,et al.  Measurements of a distributed file system , 1991, SOSP '91.

[36]  James J. Kistler,et al.  Disconnected operation in the Coda file system , 1991, SOSP '91.

[37]  Andrew R. Cherenson,et al.  The Sprite network operating system , 1988, Computer.

[38]  Scott Shenker,et al.  Epidemic algorithms for replicated database maintenance , 1988, OPSR.

[39]  Andrew S. Tanenbaum,et al.  A distributed file service based on optimistic concurrency control , 1985, SOSP '85.

[40]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[41]  E. L. Miller,et al.  Efficient Metadata Management in Large Distributed File Systems , .