Improved Bounds and Schemes for the Declustering Problem

The declustering problem is to allocate given data on parallel working storage devices in such a manner that typical requests find their data evenly distributed among the devices. Using deep results from discrepancy theory, we improve previous work of several authors concerning rectangular queries of higher-dimensional data. For this problem, we give a declustering scheme with an additive error of O d (log d − 1 M) independent of the data size, where d is the dimension, M the number of storage devices and d-1 not larger than the smallest prime power in the canonical decomposition of M. Thus, in particular, our schemes work for arbitrary M in two and three dimensions, and arbitrary M ≥ d-1 that is a power of two. These cases seem to be the most relevant in applications. For a lower bound, we show that a recent proof of a \(\Omega_d(\log^{\frac{d-1}{2}} M)\) bound contains a critical error. Using an alternative approach, we establish this bound.

[1]  Gyula O. H. Katona,et al.  Low Discrepancy Allocation of Two-Dimensional Data , 2000, FoIKS.

[2]  Joel H. Saltz,et al.  Titan: a high-performance remote-sensing database , 1997, Proceedings 13th International Conference on Data Engineering.

[3]  John A. Richards,et al.  Remote Sensing Digital Image Analysis: An Introduction , 1999 .

[4]  Randeep Bhatia,et al.  Asymptotically Optimal Declustering Schemes for Range Queries , 2001, ICDT.

[5]  John S. Sobolewski,et al.  Disk allocation for Cartesian product files on multiple-disk systems , 1982, TODS.

[6]  Thomas P. Hayes,et al.  The Cost of the Missing Bit: Communication Complexity with Help , 1998, STOC '98.

[7]  Benjamin Doerr,et al.  Improved bounds and schemes for the declustering problem , 2006, Theor. Comput. Sci..

[8]  Fouad B. Chedid Optimal parallel block access for range queries , 2004, Proceedings. Tenth International Conference on Parallel and Distributed Systems, 2004. ICPADS 2004..

[9]  C. G. Miller,et al.  Environment-dealing with the data deluge , 1993 .

[10]  Roger C. Baker On Irregularities of Distribution II , 1999 .

[11]  William W. L. Chen On irregularities of distribution. , 1980 .

[12]  W. Schmidt On irregularities of distribution vii , 1972 .

[13]  Anand Srivastav,et al.  Approximation of Multi-color Discrepancy , 1999, RANDOM-APPROX.

[14]  Randeep Bhatia,et al.  Asymptotically optimal declustering schemes for 2-dim range queries , 2003, Theor. Comput. Sci..

[15]  H. Niederreiter Point sets and sequences with small discrepancy , 1987 .

[16]  Christos Faloutsos,et al.  Declustering using fractals , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[17]  Timothy M. Chan,et al.  Balanced k-colorings , 2002, Discret. Math..

[18]  Mikhail J. Atallah,et al.  (Almost) Optimal parallel block access for range queries , 2003, Inf. Sci..

[19]  Randeep Bhatia,et al.  Multidimensional Declustering Schemes Using Golden Ratio and Kronecker Sequences , 2003, IEEE Trans. Knowl. Data Eng..

[20]  Anand Srivastav,et al.  Multicolour Discrepancies , 2003, Comb. Probab. Comput..

[21]  Christine T. Cheng,et al.  From discrepancy to declustering: near-optimal multidimensional declustering strategies for range queries , 2002, PODS '02.

[22]  Diane Gershon,et al.  Dealing with the data deluge , 2002, Nature.