Study of scalable declustering algorithms for parallel grid files

The efficient storage and retrieval of large multidimensional datasets is an important concern for large-scale scientific computations, such as long-running time-dependent simulations which periodically generate snapshots of the state. The main challenge for efficiently handling such datasets is to minimize response time for multidimensional range queries. The grid file is one of the well known access methods for multidimensional and spatial data. We investigate effective and scalable declustering techniques for grid files with the primary goal of minimizing response time and the secondary goal of maximizing the fairness of data distribution. The main contributions of this paper are (1) the analytic and experimental evaluation of existing index-based declustering techniques and their extensions for grid files; and (2) the development of a proximity-based declustering algorithm called 'minimax', which is experimentally shown to scale and to consistently achieve better response time compared to available algorithms while maintaining perfect disk distribution.

[1]  G. Patnaik,et al.  Effect of gravity on flame instabilities in premixed gases , 1989 .

[2]  Shashi Shekhar,et al.  A similarity graph-based approach to declustering problems and its application towards parallelizing grid files , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[3]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[4]  Jaideep Srivastava,et al.  CMD: A Multidimensional Declustering Method for Parallel Data Systems , 1992, VLDB.

[5]  H. V. Jagadzsh Linear Clustering of Objects with Multiple Attributes , 1998 .

[6]  R. Prim Shortest connection networks and some generalizations , 1957 .

[7]  Christos Faloutsos,et al.  Declustering using fractals , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[8]  Theodore Bially,et al.  Space-filling curves: Their generation and their application to bandwidth reduction , 1969, IEEE Trans. Inf. Theory.

[9]  T. Tanaka,et al.  Configurations of the solar wind flow and magnetic field around the planets with no magnetic field : calculation by a new MHD simulation scheme , 1993 .

[10]  Michael Stonebraker,et al.  Efficient organization of large multidimensional arrays , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[11]  Jianzhong Li,et al.  CMD : A Multidimensional Declustering Method for Parallel Database Systems 1 , 1992 .

[12]  Christos Faloutsos,et al.  Fractals for secondary key retrieval , 1989, PODS.

[13]  David S. Johnson,et al.  Computers and In stractability: A Guide to the Theory of NP-Completeness. W. H Freeman, San Fran , 1979 .

[14]  Joel H. Saltz,et al.  Scalability analysis of declustering methods for Cartesian product files , 1996 .

[15]  John S. Sobolewski,et al.  Disk allocation for Cartesian product files on multiple-disk systems , 1982, TODS.

[16]  Kwan-Liu Ma,et al.  3D visualization of unsteady 2D airplane wake vortices , 1994, Proceedings Visualization '94.

[17]  Marianne Winslett,et al.  Physical schemas for large multidimensional arrays in scientific computing applications , 1994, Seventh International Working Conference on Scientific and Statistical Database Management.

[18]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[19]  Horst D. Simon,et al.  Partitioning of unstructured problems for parallel processing , 1991 .

[20]  Chin-Chen Chang,et al.  The Idea of De-Clustering and its Applications , 1986, VLDB.

[21]  G. Bird Molecular Gas Dynamics and the Direct Simulation of Gas Flows , 1994 .

[22]  Sakti Pramanik,et al.  Optimal file distribution for partial match retrieval , 1988, SIGMOD '88.

[23]  A. Guttmma,et al.  R-trees: a dynamic index structure for spatial searching , 1984 .

[24]  Christos Faloutsos,et al.  Parallel R-trees , 1992, SIGMOD '92.

[25]  S.,et al.  An Efficient Heuristic Procedure for Partitioning Graphs , 2022 .

[26]  Alan H. Karp,et al.  Programming for Parallelism , 1987, Computer.

[27]  Joel H. Saltz,et al.  Adaptive runtime support for direct simulation Monte Carlo methods on distributed memory architectures , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.