Optimization problems and solution methods in the design of data distribution

Abstract The design of data distribution requires solving interrelated optimization problems, concerning data fragmentation and allocation, operations optimization and allocation, and the evaluation of system's performance for given data and operation allocations. Each problem can be solved with several different solution methods; thus, designing data distribution is a very difficult task. In this paper, we characterize each problem and the interactions between them, presenting a general framework for the design of data distribution. We present an entity-relationship schema of the design data dictionary, which stores all information useful during the design, and we use the data dictionary to specify the input and output data for each design problem. We then discuss how the design problems interact. Finally, we present an integrated toolset for the vertical partitioning of relations, which uses different solution methods, called DIVIDE and CONQUER. DIVIDE has a simple analytic model for performance evaluation, while CONQUER uses a detailed cost model. The complexity of methods and experimental results are presented.

[1]  S. Ceri,et al.  Distributed database design methodologies , 1987, Proceedings of the IEEE.

[2]  Dorit S. Hochbaum,et al.  Database Location in Computer Networks , 1980, JACM.

[3]  Domenico Saccà,et al.  Database partitioning in a cluster of processors , 1983, TODS.

[4]  K. Dan Levin,et al.  Optimal program and data locations in computer networks , 1977, CACM.

[5]  Xiaolin Du,et al.  Data allocation in a dynamically reconfigurable environment , 1988, Proceedings. Fourth International Conference on Data Engineering.

[6]  Paul De Bra,et al.  On Horizontal Decompositions , 1981, XP2 Workshop on Relational Database Theory.

[7]  Eugene Wong,et al.  Query processing in sdd-i: a system for distributed databases , 1979 .

[8]  Philip S. Yu,et al.  A vertical partitioning algorithm for relational databases , 1987, 1987 IEEE Third International Conference on Data Engineering.

[9]  Shamkant B. Navathe,et al.  Vertical partitioning algorithms for database design , 1984, TODS.

[10]  J. Spruce Riordon,et al.  Optimal allocation of resources in distributed information networks , 1976, TODS.

[11]  Gary D. Scudder,et al.  On the selection of efficient record segmentations and backup strategies for large shared databases , 1984, TODS.

[12]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[13]  Shamkant B. Navathe,et al.  Distribution Design of Logical Database Schemas , 1983, IEEE Transactions on Software Engineering.

[14]  Toby J. Teorey,et al.  Design of Database Structures , 1982 .

[15]  Michael Hammer,et al.  A heuristic approach to attribute partitioning , 1979, SIGMOD '79.

[16]  Gio Wiederhold,et al.  Database Design , 1977 .

[17]  Paul J. Schweitzer,et al.  Problem Decomposition and Data Reorganization by a Clustering Technique , 1972, Oper. Res..

[18]  Stefano Ceri Directions in Distributed Databases , 1988, GI Jahrestagung.

[19]  Eugene Wong,et al.  Query processing in a system for distributed databases (SDD-1) , 1981, TODS.

[20]  Philip S. Yu,et al.  Site assignment for relations and joint operations in the distributed transaction processing environment , 1988, Proceedings. Fourth International Conference on Data Engineering.

[21]  Stefano Ceri,et al.  Distributed Databases: Principles and Systems , 1984 .

[22]  Stefano Ceri,et al.  Allocation of Operations in Distributed Database Access , 1982, IEEE Transactions on Computers.

[23]  Shi-Kuo Chang,et al.  A Methodology for Structured Database Decomposition , 1980, IEEE Transactions on Software Engineering.