A similarity graph-based approach to declustering problems and its application towards parallelizing grid files

We propose a new similarity-based technique for declustering data. The proposed method can adapt to available information about query distributions, data distributions, data sizes and partition-size constraints. The method is based on max-cut partitioning of a similarity graph defined over the given set of data, under constraints on the partition sizes. It maximizes the chances that a pair of data-items that are to be accessed together by queries are allocated to distinct disks. We show that the proposed method can achieve optimal speed-up for a query-set, if there exists any other declustering method which will achieve the optimal speed-up. Experiments in parallelizing grid files show that the proposed method outperforms mapping-function-based methods for interesting query distributions as well for non-uniform data distributions.<<ETX>>

[1]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[2]  Gerhard Weikum,et al.  Dynamic file allocation in disk arrays , 1991, SIGMOD '91.

[3]  Christos Faloutsos,et al.  Parallel R-trees , 1992, SIGMOD '92.

[4]  S.,et al.  An Efficient Heuristic Procedure for Partitioning Graphs , 2022 .

[5]  David J. DeWitt,et al.  A multiuser performance analysis of alternative declustering strategies , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[6]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multi-Key File Structure , 1981, ECI.

[7]  Bernhard Seeger,et al.  Multi-disk B-trees , 1991, SIGMOD '91.

[8]  Christos Faloutsos,et al.  Declustering using fractals , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[9]  Jianzhong Li,et al.  CMD : A Multidimensional Declustering Method for Parallel Database Systems 1 , 1992 .

[10]  Ramez Elmasri,et al.  Declustering techniques for parallelizing temporal access structures , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[11]  Shashi Shekhar,et al.  Disk allocation methods for parallelizing grid files , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[12]  Jaideep Srivastava,et al.  Performance evaluation of grid based multi-attribute record declustering methods , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[13]  Chin-Chen Chang,et al.  Performance of two-disk partition data allocations , 1987, BIT.

[14]  E. Barnes An algorithm for partitioning the nodes of a graph , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[15]  Sakti Pramanik,et al.  Optimal file distribution for partial match retrieval , 1988, SIGMOD '88.

[16]  David J. DeWitt,et al.  Hybrid-Range Partitioning Strategy: A New Declustering Strategy for Multiprocessor Database Machines , 1990, VLDB.

[17]  Chin-Chen Chang,et al.  The Idea of De-Clustering and its Applications , 1986, VLDB.

[18]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[19]  Jaideep Srivastava,et al.  CMD: A Multidimensional Declustering Method for Parallel Data Systems , 1992, VLDB.

[20]  Chung-Kuan Cheng,et al.  An improved two-way partitioning algorithm with stable performance [VLSI] , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[21]  Christos Faloutsos,et al.  Disk Allocation Methods Using Error Correcting Codes , 1991, IEEE Trans. Computers.

[22]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[23]  Arie Segev,et al.  Data Allocation for Multi-Disk Databases , 1993, IEEE Trans. Knowl. Data Eng..

[24]  Viktor K. Prasanna,et al.  Latin Squares for Parallel Array Access , 1993, IEEE Trans. Parallel Distributed Syst..

[25]  John S. Sobolewski,et al.  Disk allocation for Cartesian product files on multiple-disk systems , 1982, TODS.