Efficient retrieval of multidimensional datasets through parallel I/O

Many scientific and engineering applications process large multidimensional datasets. An important access pattern for these applications is the retrieval of data corresponding to ranges of values in multiple dimensions. Performance is limited by disk largely due to high disk latencies. Tiling and distributing the data across multiple disks is an effective technique for improving performance through parallel I/O. The distribution of tiles across the disks is an important factor in achieving gains. Several schemes for declustering multidimensional data to improve the performance of range queries have been proposed in the literature. We extend the class of cyclic schemes which have been developed earlier for two-dimensional data to multiple dimensions. We establish important properties of cyclic schemes, based upon which we reduce the search space for determining good declustering schemes within the class of cyclic schemes. Through experimental evaluation, we establish that the cyclic schemes are superior to other declustering schemes, including the state-of-the-art, both in terms of the degree of parallelism and robustness.

[1]  Jianzhong Li,et al.  CMD : A Multidimensional Declustering Method for Parallel Database Systems 1 , 1992 .

[2]  Divyakant Agrawal,et al.  Efficient disk allocation for fast similarity searching , 1998, SPAA '98.

[3]  Ronald L. Rivest,et al.  An application of number theory to the organization of raster-graphics memory , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[4]  Christos Faloutsos,et al.  Declustering using fractals , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[5]  Rajeev Thakur,et al.  Passion: Optimized I/O for Parallel Applications , 1996, Computer.

[6]  Christos Faloutsos,et al.  Declustering using error correcting codes , 1989, PODS '89.

[7]  Khaled A. S. Abdel-Ghaffar,et al.  Cyclic allocation of two-dimensional data , 1998, Proceedings 14th International Conference on Data Engineering.

[8]  John S. Sobolewski,et al.  Disk allocation for Cartesian product files on multiple-disk systems , 1982, TODS.

[9]  Jaideep Srivastava,et al.  CMD: A Multidimensional Declustering Method for Parallel Data Systems , 1992, VLDB.

[10]  Khaled A. S. Abdel-Ghaffar,et al.  Optimal disk allocation for partial match queries , 1993, TODS.

[11]  Jim Gray,et al.  Parity Striping of Disk Arrays: Low-Cost Reliable Storage with Acceptable Throughput , 1990, VLDB.

[12]  Sakti Pramanik,et al.  Optimal file distribution for partial match retrieval , 1988, SIGMOD '88.

[13]  Khaled A. S. Abdel-Ghaffar,et al.  Optimal Allocation of Two-Dimensional Data , 1997, ICDT.

[14]  Christian Böhm,et al.  Fast parallel similarity search in multimedia databases , 1997, SIGMOD '97.

[15]  Ronald L. Rivest,et al.  An application of number theory to the organization of raster-graphics memory , 1982, FOCS 1982.