Optimal Parallel I/O for Range Queries through Replication

In this paper we study the problem of declustering two-dimensional datasets with replication over parallel devices to improve range query performance. The related problem of declustering without replication has been well studied. It has been established that strictly optimal declustering schemes do not exist if data is not replicated. In addition to the usual problem of identifying a good allocation, the replicated version of the problem needs to address the issue of identifying a good retrieval schedule for a given query. We address both problems in this paper. An efficient algorithm for finding a lowest cost retrieval schedule is developed. This algorithm works for any query, not just range queries. Two replicated placement schemes are presented - one that results in a strictly optimal allocation, and another that guarantees a retrieval cost that is either optimal or 1 more than the optimal for any range query.

[1]  Jianzhong Li,et al.  CMD : A Multidimensional Declustering Method for Parallel Database Systems 1 , 1992 .

[2]  Jaideep Srivastava,et al.  CMD: A Multidimensional Declustering Method for Parallel Data Systems , 1992, VLDB.

[3]  Mikhail J. Atallah,et al.  (Almost) Optimal parallel block access for range queries , 2003, Inf. Sci..

[4]  Khaled A. S. Abdel-Ghaffar,et al.  Efficient retrieval of multidimensional datasets through parallel I/O , 1998, Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238).

[5]  Jim Gray,et al.  Parity Striping of Disk Arrays: Low-Cost Reliable Storage with Acceptable Throughput , 1990, VLDB.

[6]  Khaled A. S. Abdel-Ghaffar,et al.  Optimal disk allocation for partial match queries , 1993, TODS.

[7]  Christos Faloutsos,et al.  Declustering using fractals , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[8]  Kien A. Hua,et al.  A General Multidimensional Data Allocation Method for Multicomputer Database Systems , 1997, DEXA.

[9]  Khaled A. S. Abdel-Ghaffar,et al.  Cyclic allocation of two-dimensional data , 1998, Proceedings 14th International Conference on Data Engineering.

[10]  Peter Sanders,et al.  Fast Concurrent Access to Parallel Disks , 2000, SODA '00.

[11]  Christian Böhm,et al.  Fast parallel similarity search in multimedia databases , 1997, SIGMOD '97.

[12]  Hakan Ferhatosmanoglu,et al.  Optimal parallel I/O using replication , 2002, Proceedings. International Conference on Parallel Processing Workshop.

[13]  Sakti Pramanik,et al.  Optimal file distribution for partial match retrieval , 1988, SIGMOD '88.

[14]  Khaled A. S. Abdel-Ghaffar,et al.  Optimal Allocation of Two-Dimensional Data , 1997, ICDT.

[15]  Randeep Bhatia,et al.  Asymptotically Optimal Declustering Schemes for Range Queries , 2001, ICDT.

[16]  John S. Sobolewski,et al.  Disk allocation for Cartesian product files on multiple-disk systems , 1982, TODS.

[17]  Randeep Bhatia,et al.  Declustering using golden ratio sequences , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[18]  Divyakant Agrawal,et al.  Data declustering for efficient range and similarity searching , 1998, Other Conferences.

[19]  Christos Faloutsos,et al.  Declustering using error correcting codes , 1989, PODS '89.

[20]  Divyakant Agrawal,et al.  Efficient disk allocation for fast similarity searching , 1998, SPAA '98.