Threshold Based Declustering in High Dimensions

Declustering techniques reduce query response times through parallel I/O by distributing data among multiple devices. Except for a few cases it is not possible to find declustering schemes that are optimal for all spatial range queries. As a result of this, most of the research on declustering have focused on finding schemes with low worst case additive error. Recently, constrained declustering that maximizes the threshold k such that all spatial range queries ≤ k buckets are optimal is proposed. In this paper, we extend constrained declustering to high dimensions. We investigate high dimensional bound diagrams that are used to provide upper bound on threshold and propose a method to find good threshold-based declustering schemes in high dimensions. We show that using replicated declustering with threshold N, low worst case additive error can be achieved for many values of N. In addition, we propose a framework to find thresholds in replicated declustering.

[1]  Hakan Ferhatosmanoglu,et al.  Optimal parallel I/O using replication , 2002, Proceedings. International Conference on Parallel Processing Workshop.

[2]  Sakti Pramanik,et al.  Optimal file distribution for partial match retrieval , 1988, SIGMOD '88.

[3]  Christine T. Cheng,et al.  From discrepancy to declustering: near-optimal multidimensional declustering strategies for range queries , 2002, PODS '02.

[4]  Mikhail J. Atallah,et al.  (Almost) optimal parallel block access to range queries , 2000, PODS '00.

[5]  Ali Saman Tosun Constrained declustering , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[6]  Hakan Ferhatosmanoglu,et al.  Replicated declustering of spatial data , 2004, PODS '04.

[7]  Divyakant Agrawal,et al.  Concentric hyperspaces and disk allocation for fast parallel range searching , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[8]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[9]  David J. DeWitt,et al.  Hybrid-Range Partitioning Strategy: A New Declustering Strategy for Multiprocessor Database Machines , 1990, VLDB.

[10]  Khaled A. S. Abdel-Ghaffar,et al.  Optimal Allocation of Two-Dimensional Data , 1997, ICDT.

[11]  Christos Faloutsos,et al.  Declustering using error correcting codes , 1989, PODS '89.

[12]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[13]  Ali Saman Tosun,et al.  Replicated declustering for arbitrary queries , 2004, SAC '04.

[14]  Kien A. Hua,et al.  A General Multidimensional Data Allocation Method for Multicomputer Database Systems , 1997, DEXA.

[15]  Randeep Bhatia,et al.  Declustering using golden ratio sequences , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[16]  Jim Gray,et al.  Parity Striping of Disk Arrays: Low-Cost Reliable Storage with Acceptable Throughput , 1990, VLDB.

[17]  Christine T. Cheng,et al.  Replication and retrieval strategies of multidimensional data on parallel disks , 2003, CIKM '03.

[18]  John S. Sobolewski,et al.  Disk allocation for Cartesian product files on multiple-disk systems , 1982, TODS.

[19]  Christian Böhm,et al.  Fast parallel similarity search in multimedia databases , 1997, SIGMOD '97.

[20]  Ali Saman Tosun Design theoretic approach to replicated declustering , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[21]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[22]  Khaled A. S. Abdel-Ghaffar,et al.  Cyclic allocation of two-dimensional data , 1998, Proceedings 14th International Conference on Data Engineering.

[23]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[24]  Randeep Bhatia,et al.  Hierarchical Declustering Schemes for Range Queries , 2000, EDBT.

[25]  David J. DeWitt,et al.  A performance analysis of alternative multi-attribute declustering strategies , 1992, SIGMOD '92.