A Graph Theoretical Approach to Determine a Join Reducer Sequence in Distributed Query Processing

Semijoin has traditionally been relied upon to reduce the cost of data transmission for distributed query processing. However, judiciously applying join operations as reducers can lead to further reduction in the amount of data transmission required. In view of this fact, we explore the approach of using join operations as reducers in distributed query processing. We first show that the problem of determining a sequence of join operations for a query can be transformed to that of finding a specific type of set of cuts to the corresponding query graph, where a cut to a graph is a partition of nodes in that graph. Then, in light of this concept, we prove that the problem of determining the optimal sequence of join operations for a given query graph is of exponential complexity, thus justifying the necessity of applying heuristic approaches to solve this problem. By mapping the problem of determining a sequence of join reducers into the one of finding a set of cuts, we develop (for tree and general query graphs, respectively) efficient heuristic algorithms to determine a join reducer sequence for distributed query processing. The algorithms developed are based on the concept of divide and conquer and are of polynomial time complexity. Simulation is performed to evaluate these algorithms. >

[1]  Philip S. Yu,et al.  Interleaving a Join Sequence with Semijoins in Distributed Query Processing , 1992, IEEE Trans. Parallel Distributed Syst..

[2]  Eugene Wong,et al.  Query processing in a system for distributed databases (SDD-1) , 1981, TODS.

[3]  Shimon Even,et al.  Graph Algorithms , 1979 .

[4]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[5]  Goetz Graefe,et al.  Rule-Based Query Optimization in Extensible Database Systems , 1987 .

[6]  Stefano Ceri,et al.  Distributed Databases: Principles and Systems , 1984 .

[7]  Patricia G. Selinger,et al.  Access Path Selection in Distributed Database Management Systems , 1980, ICOD.

[8]  S. B. Yao,et al.  Optimization Algorithms for Distributed Queries , 1986, IEEE Transactions on Software Engineering.

[9]  Alan R. Hevner,et al.  The optimization of query processing on distributed database systems , 1979 .

[10]  Sakti Pramanik,et al.  Optimizing Join Queries in Distributed Databases , 1988, IEEE Trans. Software Eng..

[11]  Stéphane Lafortune,et al.  An Intelligent Search Method for Query Optimization by Semijoins , 1989, IEEE Trans. Knowl. Data Eng..

[12]  Danièle Gardy,et al.  On the effect of join operations on relation sizes , 1989, TODS.

[13]  Chihping Wang The complexity of processing tree queries in distributed databases , 1990, Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing 1990.

[14]  Philip S. Yu,et al.  Effect of Skew on Join Performance in Parallel Architectures , 1988, Proceedings [1988] International Symposium on Databases in Parallel and Distributed Systems.

[15]  C. Mohan Recent Advances in Distributed Data Base Management , 1984 .

[16]  Arbee L. P. Chen,et al.  Optimizing Star Queries in a Distributed Database System , 1984, VLDB.

[17]  Philip S. Yu,et al.  Combining Join and Semi-Join Operations for Distributed Query Processing , 1993, IEEE Trans. Knowl. Data Eng..

[18]  Eugene Wong,et al.  A state transition model for distributed query processing , 1986, TODS.

[19]  Dean Daniels,et al.  Query Processing in R* , 1985, Query Processing in Database Systems.

[20]  Masatoshi Yoshikawa,et al.  Query processing for distributed databases using generalized semi-joins , 1982, SIGMOD '82.

[21]  Philip A. Bernstein,et al.  Using Semi-Joins to Solve Relational Queries , 1981, JACM.

[22]  Alan R. Hevner,et al.  Query Processing in Distributed Database System , 1979, IEEE Transactions on Software Engineering.

[23]  Philip A. Bernstein,et al.  Optimizing Chain Queries in a Distributed Database System , 1984, SIAM J. Comput..

[24]  Philip S. Yu,et al.  On Workload Characterization of Relational Database Environments , 1992, IEEE Trans. Software Eng..

[25]  Philip S. Yu,et al.  Scheduling and processor allocation for parallel execution of multijoin queries , 1992, [1992] Eighth International Conference on Data Engineering.

[26]  Frank Harary,et al.  Graph Theory , 2016 .

[27]  Clement T. Yu,et al.  Optimization of Distributed Tree Queries , 1984, J. Comput. Syst. Sci..

[28]  Nathan Goodman,et al.  The tree property is fundamental for query processing , 1982, PODS '82.

[29]  Arbee L. P. Chen,et al.  Improvement Algorithms for Semijoin Query Processing Programs in Distributed Database Systems , 1984, IEEE Transactions on Computers.

[30]  Stavros Christodoulakis,et al.  On the propagation of errors in the size of join results , 1991, SIGMOD '91.

[31]  Clement T. Yu,et al.  Distributed query processing , 1984, CSUR.