Optimal Distributed Declustering Using Replication

A common technique for improving performance for database query retrieval is to decluster the database among multiple disks so that retrievals can be parallelized. In this paper we focus on answering range queries over a multidimensional database, where each of its dimensions are divided uniformly to obtain tiles which are placed on different disks; there has been a significant amount of research for determining how to place the records on disks to minimize the retrieval time. Recently, the idea of using replication (i.e., placing records on more than one disk) to improve performance has been introduced. When using replication there are two goals: i) to minimize the retrieval time and ii) to minimize the scheduling overhead it takes to determine which disk obtains a specific record when processing a query. The previously known replicated declustering schemes with low retrieval times are randomized; and one of the primary advantages of randomized schemes is that they balance the load evenly among the disks for large queries with high probability. In this paper we introduce a new class of replicated placement schemes called the shift schemes that are: i) deterministic, ii) have retrieval performance that is comparable to the randomized schemes, iii) have a strictly optimal retrieval time for all large queries, and iv) have a more efficient query scheduling algorithm than those for the randomized placements. Furthermore, we display experimental results that suggest that the shift schemes have stronger average performance (in terms of retrieval times) than the randomized schemes.

[1]  Mikhail J. Atallah,et al.  Replicated Parallel I/O without Additional Scheduling Costs , 2003, DEXA.

[2]  Khaled A. S. Abdel-Ghaffar,et al.  Cyclic allocation of two-dimensional data , 1998, Proceedings 14th International Conference on Data Engineering.

[3]  Mikhail J. Atallah,et al.  Optimal Parallel I/O for Range Queries through Replication , 2002, DEXA.

[4]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[5]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[6]  David J. DeWitt,et al.  Chained declustering: a new availability strategy for multiprocessor database machines , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[7]  Randeep Bhatia,et al.  Asymptotically Optimal Declustering Schemes for Range Queries , 2001, ICDT.

[8]  John S. Sobolewski,et al.  Disk allocation for Cartesian product files on multiple-disk systems , 1982, TODS.

[9]  Christine T. Cheng,et al.  Replication and retrieval strategies of multidimensional data on parallel disks , 2003, CIKM '03.

[10]  Hakan Ferhatosmanoglu,et al.  Optimal parallel I/O using replication , 2002, Proceedings. International Conference on Parallel Processing Workshop.

[11]  Peter Sanders,et al.  Reconciling simplicity and realism in parallel disk models , 2001, SODA '01.

[12]  Peter Sanders,et al.  Fast Concurrent Access to Parallel Disks , 2000, SODA '00.

[13]  Doron Rotem,et al.  Optimal response time retrieval of replicated data (extended abstract) , 1994, PODS '94.

[14]  Christine T. Cheng,et al.  From discrepancy to declustering: near-optimal multidimensional declustering strategies for range queries , 2002, PODS '02.

[15]  Khaled A. S. Abdel-Ghaffar,et al.  Optimal Allocation of Two-Dimensional Data , 1997, ICDT.

[16]  Sakti Pramanik,et al.  Optimal file distribution for partial match retrieval , 1988, SIGMOD '88.

[17]  Christian Scheideler,et al.  Perfectly Balanced Allocation , 2003, RANDOM-APPROX.

[18]  Mikhail J. Atallah,et al.  (Almost) optimal parallel block access to range queries , 2000, PODS '00.

[19]  Jan H. M. Korst,et al.  Random duplicate storage strategies for load balancing in multimedia servers , 2000, Inf. Process. Lett..

[20]  Randeep Bhatia,et al.  Hierarchical Declustering Schemes for Range Queries , 2000, EDBT.

[21]  Randeep Bhatia,et al.  Declustering using golden ratio sequences , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[22]  Per-Ake Larson,et al.  Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, USA, June 1-3, 1988 , 1988, SIGMOD 1988.