Design theoretic approach to replicated declustering

Declustering techniques reduce query response times through parallel I/O by distributing data among multiple devices. Most of the research on declustering is targeted at spatial range queries and investigates schemes with low additive error. Recently, declustering using replication is proposed to reduce the additive overhead. Replication significantly reduces retrieval cost of arbitrary queries. In this paper, we propose a disk allocation and retrieval mechanism for arbitrary queries based on design theory. Using proposed c-copy replicated declustering scheme, (c - 1)k/sup 2/ + ck buckets can be retrieved using at most k disk accesses. Retrieval algorithm is very efficient and is asymptotically optimal with /spl Theta/(|Q|) complexity for a query Q. In addition to the deterministic worst-case bound and efficient retrieval, proposed algorithm handles nonuniform data, high dimensions, supports incremental declustering and has good fault-tolerance property.

[1]  Hakan Ferhatosmanoglu,et al.  Optimal parallel I/O using replication , 2002, Proceedings. International Conference on Parallel Processing Workshop.

[2]  Sakti Pramanik,et al.  Optimal file distribution for partial match retrieval , 1988, SIGMOD '88.

[3]  A. Guttmma,et al.  R-trees: a dynamic index structure for spatial searching , 1984 .

[4]  David J. DeWitt,et al.  A performance analysis of alternative multi-attribute declustering strategies , 1992, SIGMOD '92.

[5]  Christos Faloutsos,et al.  Declustering using error correcting codes , 1989, PODS '89.

[6]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[7]  John S. Sobolewski,et al.  Disk allocation for Cartesian product files on multiple-disk systems , 1982, TODS.

[8]  Mikhail J. Atallah,et al.  (Almost) Optimal parallel block access for range queries , 2003, Inf. Sci..

[9]  Jim Gray,et al.  Parity Striping of Disk Arrays: Low-Cost Reliable Storage with Acceptable Throughput , 1990, VLDB.

[10]  Ali Saman Tosun,et al.  Replicated declustering for arbitrary queries , 2004, SAC '04.

[11]  Christine T. Cheng,et al.  From discrepancy to declustering: near-optimal multidimensional declustering strategies for range queries , 2002, PODS '02.

[12]  C. Colbourn,et al.  The CRC handbook of combinatorial designs , edited by Charles J. Colbourn and Jeffrey H. Dinitz. Pp. 784. $89.95. 1996. ISBN 0-8493-8948-8 (CRC). , 1997, The Mathematical Gazette.

[13]  Randeep Bhatia,et al.  Hierarchical Declustering Schemes for Range Queries , 2000, EDBT.

[14]  C. Colbourn,et al.  Handbook of Combinatorial Designs , 2006 .

[15]  Christine T. Cheng,et al.  Replication and retrieval strategies of multidimensional data on parallel disks , 2003, CIKM '03.

[16]  Khaled A. S. Abdel-Ghaffar,et al.  Optimal Allocation of Two-Dimensional Data , 1997, ICDT.

[17]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[18]  Randeep Bhatia,et al.  Declustering using golden ratio sequences , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[19]  Hakan Ferhatosmanoglu,et al.  Replicated declustering of spatial data , 2004, PODS '04.

[20]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[21]  David J. DeWitt,et al.  Hybrid-Range Partitioning Strategy: A New Declustering Strategy for Multiprocessor Database Machines , 1990, VLDB.