Partitioning Similarity Graphs: A Framework for Declustering Problems

Declustering problems are well-known in the databases for parallel computing environments. In this paper, we propose a new similarity-based technique for declustering data. The proposed method can adapt to the available information about query distribution (e.g. size, shape and frequency) and can work with alternative atomic data-types. Furthermore, the proposed method is flexible and can work with alternative data distributions, data sizes and partition-size constraints. The method is based on max-cut partitioning of a similarity graph defined over the given set of data, under constraints on the partition sizes. It maximizes the chances that a pair of atomic data-items that are frequently accessed together by queries are allocated to distinct disks. We describe the application of the proposed method to parallelizing Grid Files at the data page level. Detailed experiments in this context show that the proposed method adapts to query distribution and data distribution, and that it outperforms traditional mapping-function-based methods for many interesting query distributions as well for several non-uniform data distributions.

[1]  Chung-Kuan Cheng,et al.  An improved two-way partitioning algorithm with stable performance [VLSI] , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[2]  Chin-Chen Chang,et al.  Performance of two-disk partition data allocations , 1987, BIT.

[3]  E. Barnes An algorithm for partitioning the nodes of a graph , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[4]  Jianzhong Li,et al.  CMD : A Multidimensional Declustering Method for Parallel Database Systems 1 , 1992 .

[5]  Shashi Shekhar,et al.  CCAM: a connectivity-clustered access method for aggregate queries on transportation networks: a summary of results , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[6]  Sakti Pramanik,et al.  Optimal file distribution for partial match retrieval , 1988, SIGMOD '88.

[7]  Doron Rotem,et al.  Declustering Objects for Visualization , 1993, VLDB.

[8]  Chin-Chen Chang,et al.  The Idea of De-Clustering and its Applications , 1986, VLDB.

[9]  Shashi Shekhar,et al.  Disk allocation methods for parallelizing grid files , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[10]  Christos Faloutsos,et al.  Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension , 1994, PODS.

[11]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[12]  Jaideep Srivastava,et al.  CMD: A Multidimensional Declustering Method for Parallel Data Systems , 1992, VLDB.

[13]  Jaideep Srivastava,et al.  Performance evaluation of grid based multi-attribute record declustering methods , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[14]  Gerhard Weikum,et al.  Dynamic file allocation in disk arrays , 1991, SIGMOD '91.

[15]  John S. Sobolewski,et al.  Disk allocation for Cartesian product files on multiple-disk systems , 1982, TODS.

[16]  Steven J. Plimpton,et al.  Massively parallel methods for engineering and science problems , 1994, CACM.

[17]  David J. DeWitt,et al.  A performance analysis of alternative multi-attribute declustering strategies , 1992, SIGMOD '92.

[18]  Christos Faloutsos,et al.  Parallel R-trees , 1992, SIGMOD '92.

[19]  S.,et al.  An Efficient Heuristic Procedure for Partitioning Graphs , 2022 .

[20]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multi-Key File Structure , 1981, ECI.

[21]  H. V. Jagadzsh Linear Clustering of Objects with Multiple Attributes , 1998 .

[22]  Christos Faloutsos,et al.  Disk Allocation Methods Using Error Correcting Codes , 1991, IEEE Trans. Computers.

[23]  Chingwei Yeh,et al.  A general purpose multiple way partitioning algorithm , 1991, 28th ACM/IEEE Design Automation Conference.

[24]  Jeffrey F. Naughton,et al.  A stochastic approach for clustering in object bases , 1991, SIGMOD '91.

[25]  Charles M. Fiduccia,et al.  A linear-time heuristic for improving network partitions , 1988, 25 years of DAC.

[26]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[27]  Viktor K. Prasanna,et al.  Latin Squares for Parallel Array Access , 1993, IEEE Trans. Parallel Distributed Syst..

[28]  Bernhard Seeger,et al.  Multi-disk B-trees , 1991, SIGMOD '91.

[29]  Christos Faloutsos,et al.  Declustering using fractals , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[30]  David J. DeWitt,et al.  Hybrid-Range Partitioning Strategy: A New Declustering Strategy for Multiprocessor Database Machines , 1990, VLDB.

[31]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[32]  Arie Segev,et al.  Data Allocation for Multi-Disk Databases , 1993, IEEE Trans. Knowl. Data Eng..

[33]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[34]  David J. DeWitt,et al.  A multiuser performance analysis of alternative declustering strategies , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[35]  Hung-Chang Du Disk allocation methods for binary Cartesian product files , 1986, BIT Comput. Sci. Sect..

[36]  Ramez Elmasri,et al.  Declustering techniques for parallelizing temporal access structures , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.