Improved bounds and schemes for the declustering problem

The declustering problem is to allocate given data on parallel working storage devices in such a manner that typical requests find their data evenly distributed on the devices. Using deep results from discrepancy theory, we improve previous work of several authors concerning range queries to higher-dimensional data. We give a declustering scheme with an additive error of Od (logd-1M) independent of the data size, where d is the dimension, M the number of storage devices and d - 1 does not exceed the smallest prime power in the canonical decomposition of M into prime powers. In particular, our schemes work for arbitrary-M in dimensions two and three. For general d, they work for all M≥d - 1 that are powers of two. Concerning lower bounds, we show that a recent proof of a Ωd (log(d-1)/2M) bound contains an error. We close the gap in the proof and thus establish the bound.

[1]  Christos Faloutsos,et al.  Declustering using fractals , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[2]  Gyula O. H. Katona,et al.  Low Discrepancy Allocation of Two-Dimensional Data , 2000, FoIKS.

[3]  Timothy M. Chan,et al.  Balanced k-colorings , 2002, Discret. Math..

[4]  Randeep Bhatia,et al.  Multidimensional Declustering Schemes Using Golden Ratio and Kronecker Sequences , 2003, IEEE Trans. Knowl. Data Eng..

[5]  Fouad B. Chedid Optimal parallel block access for range queries , 2004, Proceedings. Tenth International Conference on Parallel and Distributed Systems, 2004. ICPADS 2004..

[6]  W. Schmidt On irregularities of distribution vii , 1972 .

[7]  Anand Srivastav,et al.  Approximation of Multi-color Discrepancy , 1999, RANDOM-APPROX.

[8]  Thomas P. Hayes,et al.  The Cost of the Missing Bit: Communication Complexity with Help , 2001, Comb..

[9]  John S. Sobolewski,et al.  Disk allocation for Cartesian product files on multiple-disk systems , 1982, TODS.

[10]  H. Niederreiter Point sets and sequences with small discrepancy , 1987 .

[11]  Mikhail J. Atallah,et al.  (Almost) optimal parallel block access to range queries , 2000, PODS '00.

[12]  Randeep Bhatia,et al.  Asymptotically optimal declustering schemes for 2-dim range queries , 2003, Theor. Comput. Sci..

[13]  John A. Richards,et al.  Remote Sensing Digital Image Analysis: An Introduction , 1999 .

[14]  Benjamin Doerr,et al.  Improved Bounds and Schemes for the Declustering Problem , 2004, MFCS.

[15]  Anand Srivastav,et al.  Multicolour Discrepancies , 2003, Comb. Probab. Comput..

[16]  Christine T. Cheng,et al.  From discrepancy to declustering: Near-optimal multidimensional declustering strategies for range queries , 2004 .

[17]  Joel H. Saltz,et al.  Titan: a high-performance remote-sensing database , 1997, Proceedings 13th International Conference on Data Engineering.

[18]  C. G. Miller,et al.  Environment-dealing with the data deluge , 1993 .

[19]  K. F. Roth On irregularities of distribution , 1954 .

[20]  W. Schmidt IRREGULARITIES OF DISTRIBUTION (Cambridge Tracts in Mathematics 89) , 1988 .

[21]  Roger C. Baker On Irregularities of Distribution II , 1999 .