Analysis and Comparison of Replicated Declustering Schemes

Declustering distributes data among parallel disks to reduce the retrieval cost using I/O parallelism. Many schemes were proposed for the single-copy declustering of spatial data. Recently, declustering using replication gained a lot of interest and several schemes with different properties were proposed. An in-depth comparison of major schemes is necessary to understand replicated declustering better. In this paper, we analyze the proposed schemes, tune some of the parameters, and compare them for different query types and under different loads. We propose a three-step retrieval algorithm for the compared schemes. For arbitrary queries, the dependent and partitioned allocation schemes perform poorly; others perform close to each other. For range queries, they perform similarly with the exception of smaller queries in which random duplicate allocation (RDA) performs poorly and dependent allocation performs well. For connected queries, partitioned allocation performs poorly and dependent allocation performs well under a light load.

[1]  Hakan Ferhatosmanoglu,et al.  Efficient parallel processing of range queries through replicated declustering , 2006, Distributed and Parallel Databases.

[2]  Christian Böhm,et al.  Fast parallel similarity search in multimedia databases , 1997, SIGMOD '97.

[3]  Ali Saman Tosun Design theoretic approach to replicated declustering , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[4]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[5]  Randeep Bhatia,et al.  Declustering using golden ratio sequences , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[6]  Kien A. Hua,et al.  A General Multidimensional Data Allocation Method for Multicomputer Database Systems , 1997, DEXA.

[7]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[8]  Mei-Yu Wu,et al.  A Hypergraph Based Approach to Declustering Problems , 2004, Distributed and Parallel Databases.

[9]  Doron Rotem,et al.  Optimal response time retrieval of replicated data (extended abstract) , 1994, PODS '94.

[10]  Khaled A. S. Abdel-Ghaffar,et al.  Cyclic allocation of two-dimensional data , 1998, Proceedings 14th International Conference on Data Engineering.

[11]  Keith B. Frikken Optimal Distributed Declustering Using Replication , 2005, ICDT.

[12]  Mikhail J. Atallah,et al.  Replicated Parallel I/O without Additional Scheduling Costs , 2003, DEXA.

[13]  Divyakant Agrawal,et al.  Efficient disk allocation for fast similarity searching , 1998, SPAA '98.

[14]  Peter Sanders,et al.  Fast Concurrent Access to Parallel Disks , 2000, SODA '00.

[15]  Jiuqiang Liu,et al.  Latin cubes and parallel array access , 1994, Proceedings of 8th International Parallel Processing Symposium.

[16]  Christine T. Cheng,et al.  From discrepancy to declustering: near-optimal multidimensional declustering strategies for range queries , 2002, PODS '02.

[17]  Randeep Bhatia,et al.  Hierarchical Declustering Schemes for Range Queries , 2000, EDBT.

[18]  Khaled A. S. Abdel-Ghaffar,et al.  Optimal Allocation of Two-Dimensional Data , 1997, ICDT.

[19]  Ali Saman Tosun Threshold Based Declustering in High Dimensions , 2005, DEXA.

[20]  Paolo Ciaccia,et al.  Dynamic Declustering Methods for Parallel Grid Files , 1996, ACPC.

[21]  Ali Saman Tosun Efficient retrieval of replicated data , 2006, Distributed and Parallel Databases.

[22]  Hakan Ferhatosmanoglu,et al.  Optimal parallel I/O using replication , 2002, Proceedings. International Conference on Parallel Processing Workshop.

[23]  Sakti Pramanik,et al.  Optimal file distribution for partial match retrieval , 1988, SIGMOD '88.

[24]  Ali Saman Tosun Threshold-based declustering , 2007, Inf. Sci..

[25]  David J. DeWitt,et al.  A multiuser performance analysis of alternative declustering strategies , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[26]  Christos Faloutsos,et al.  Declustering using error correcting codes , 1989, PODS '89.

[27]  Cevdet Aykanat,et al.  Iterative-improvement-based declustering heuristics for multi-disk databases , 2005, Inf. Syst..

[28]  Jim Gray,et al.  Parity Striping of Disk Arrays: Low-Cost Reliable Storage with Acceptable Throughput , 1990, VLDB.

[29]  Jaideep Srivastava,et al.  CMD: A Multidimensional Declustering Method for Parallel Data Systems , 1992, VLDB.

[30]  Christine T. Cheng,et al.  Replication and retrieval strategies of multidimensional data on parallel disks , 2003, CIKM '03.

[31]  Mikhail J. Atallah,et al.  (Almost) Optimal parallel block access for range queries , 2003, Inf. Sci..

[32]  Ali Saman Tosun,et al.  Replicated declustering for arbitrary queries , 2004, SAC '04.

[33]  David J. DeWitt,et al.  A performance analysis of alternative multi-attribute declustering strategies , 1992, SIGMOD '92.

[34]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[35]  Shashi Shekhar,et al.  Partitioning Similarity Graphs: A Framework for Declustering Problems , 1996, Inf. Syst..

[36]  Mikhail J. Atallah,et al.  Optimal Parallel I/O for Range Queries through Replication , 2002, DEXA.

[37]  Ali Saman Tosun Constrained declustering , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[38]  Doron Rotem,et al.  Optimal Response Time Retrieval of Replicated Data. , 1994, PODS 1994.

[39]  Jianzhong Li,et al.  CMD : A Multidimensional Declustering Method for Parallel Database Systems 1 , 1992 .

[40]  Joel H. Saltz,et al.  Study of scalable declustering algorithms for parallel grid files , 1996, Proceedings of International Conference on Parallel Processing.

[41]  Viktor K. Prasanna,et al.  Latin Squares for Parallel Array Access , 1993, IEEE Trans. Parallel Distributed Syst..

[42]  Randeep Bhatia,et al.  Asymptotically Optimal Declustering Schemes for Range Queries , 2001, ICDT.

[43]  John S. Sobolewski,et al.  Disk allocation for Cartesian product files on multiple-disk systems , 1982, TODS.

[44]  Hakan Ferhatosmanoglu,et al.  Replicated declustering of spatial data , 2004, PODS '04.

[45]  Divyakant Agrawal,et al.  Concentric hyperspaces and disk allocation for fast parallel range searching , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[46]  Christos Faloutsos,et al.  Declustering using fractals , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[47]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[48]  David J. DeWitt,et al.  Hybrid-Range Partitioning Strategy: A New Declustering Strategy for Multiprocessor Database Machines , 1990, VLDB.