Hash-based labeling techniques for storage scaling

Abstract.Scalable storage architectures allow for the addition or removal of storage devices to increase storage capacity and bandwidth or retire older devices. Assuming random placement of data objects across multiple storage devices of a storage pool, our optimization objective is to redistribute a minimum number of objects after scaling the pool. In addition, a uniform distribution, and hence a balanced load, should be ensured after redistribution. Moreover, the redistributed objects should be retrieved efficiently during the normal mode of operation: in one I/O access and with low complexity computation. To achieve this, we propose an algorithm called random disk labeling (RDL), based on double hashing, where storage can be added or removed without any increase in complexity. We compare RDL with other proposed techniques and demonstrate its effectiveness through experimentation.

[1]  Shahram Ghandeharizadeh,et al.  Continuous display using heterogeneous disk-subsystems , 1997, MULTIMEDIA '97.

[2]  Witold Litwin,et al.  LH*—a scalable, distributed data structure , 1996, TODS.

[3]  Ben Y. Zhao,et al.  An Infrastructure for Fault-tolerant Wide-area Location and Routing , 2001 .

[4]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[5]  Cyrus Shahabi,et al.  Yima: A Second-Generation Continuous Media Server , 2002, Computer.

[6]  Richard R. Muntz,et al.  Randomized data allocation for real-time disk I/O , 1996, COMPCON '96. Technologies for the Information Superhighway Digest of Papers.

[7]  Jose Renato Santos,et al.  RIO: a real-time multimedia object server , 1997, PERV.

[8]  Jeffrey Considine,et al.  Simple Load Balancing for Distributed Hash Tables , 2003, IPTPS.

[9]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[10]  Per-Åke Larson,et al.  Dynamic hash tables , 1988, CACM.

[11]  Seon Ho Kim,et al.  Striping in Multi-Disk Video Servers , 1995 .

[12]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[13]  Soon M. Chung Multimedia Information Storage and Management , 1996 .

[14]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[15]  Rajeev Rastogi,et al.  The Fellini Multimedia Storage Server , 1996 .

[16]  David Thaler,et al.  Using name-based mappings to increase hit rates , 1998, TNET.

[17]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[18]  Jose Renato Santos,et al.  Performance analysis of the RIO multimedia storage system with heterogeneous disk configurations , 1998, MULTIMEDIA '98.

[19]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[20]  Sriram Padmanabhan Data placement in shared-nothing parallel database systems , 1992 .

[21]  Ashish Goel,et al.  SCADDAR: an efficient randomized technique to reorganize continuous media blocks , 2002, Proceedings 18th International Conference on Data Engineering.

[22]  Scott A. Brandt,et al.  Reliability mechanisms for very large storage systems , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[23]  Peter Druschel,et al.  Pastry: Scalable, distributed object location and routing for large-scale peer-to- , 2001 .

[24]  Berthier A. Ribeiro-Neto,et al.  Comparing random data allocation and data striping in multimedia servers , 2000, SIGMETRICS '00.

[25]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[26]  Shahram Ghandeharizadeh,et al.  On-line Reorganization of Data in Scalable Continuous Media Servers , 1996, DEXA.

[27]  Shahram Ghandeharizadeh,et al.  Striping in multidisk video servers , 1996, Other Conferences.

[28]  Donald E. Knuth,et al.  The Art of Computer Programming, Volumes 1-3 Boxed Set , 1998 .

[29]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[30]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[31]  Robert H. Morris,et al.  Scatter storage techniques , 1983, CACM.

[32]  Patrick Valduriez,et al.  Prototyping Bubba, A Highly Parallel Database System , 1990, IEEE Trans. Knowl. Data Eng..